key: cord-0957900-t1xjy12y authors: Nazneen Akhand, Mst Rubaiat; Azim, Kazi Faizul; Hoque, Syeda Farjana; Moli, Mahmuda Akther; Joy, Bijit Das; Akter, Hafsa; Afif, Ibrahim Khalil; Ahmed, Nadim; Hasan, Mahmudul title: Genome based Evolutionary study of SARS-CoV-2 towards the Prediction of Epitope Based Chimeric Vaccine date: 2020-04-15 journal: bioRxiv DOI: 10.1101/2020.04.15.036285 sha: 323f417a74c9a82ec74c360eb64d7cf8fe1ba36c doc_id: 957900 cord_uid: t1xjy12y SARS-CoV-2 is known to infect the neurological, respiratory, enteric, and hepatic systems of human and has already become an unprecedented threat to global healthcare system. COVID-19, the most serious public condition caused by SARS-CoV-2 leads the world to an uncertainty alongside thousands of regular death scenes. Unavailability of specific therapeutics or approved vaccine has made the recovery of COVI-19 more troublesome and challenging. The present in silico study aimed to predict a novel chimeric vaccines by simultaneously targeting four major structural proteins via the establishment of ancestral relationship among different strains of coronaviruses. Conserved regions from the homologous protein sets of spike glycoprotein (S), membrane protein (M), envelope protein and nucleocapsid protein (N) were identified through multiple sequence alignment. The phylogeny analyses of whole genome stated that four proteins (S, E, M and N) reflected the close ancestral relation of SARS-CoV-2 to SARS-COV-1 and bat coronavirus. Numerous immunogenic epitopes (both T cell and B cell) were generated from the common fragments which were further ranked on the basis of antigenicity, transmembrane topology, conservancy level, toxicity and allergenicity pattern and population coverage analysis. Top putative epitopes were combined with appropriate adjuvants and linkers to construct a novel multiepitope subunit vaccine against COVID-19. The designed constructs were characterized based on physicochemical properties, allergenicity, antigenicity and solubility which revealed the superiority of construct V3 in terms safety and efficacy. Essential molecular dynamics and Normal Mode analysis confirmed minimal deformability of the refined model at molecular level. In addition, disulfide engineering was investigated to accelerate the stability of the protein. Molecular docking study ensured high binding affinity between construct V3 and HLA cells, as well as with different host receptors. Microbial expression and translational efficacy of the constructs were checked using pET28a(+) vector of E. coli strain K12. The development of preventive measures to combat COVID-19 infections might be aided the present study. However, the in vivo and in vitro validation might be ensured with wet lab trials using model animals for the implementation of the presented data. Novel coronavirus named SARS-CoV-2/ 2019-nCoV was identified at the end of 2019 in Wuhan, a city in the Hubei province of China, causing severe pneumonia that leads to huge death cases . Gradually this virus emerged as a new threat to the whole world and affecting almost all parts of the world. To date, the pathogen has affected 198 countries, and thus becoming a global public health emergency. Global public health concern with pandemic notion of COVID-19 was declared on January 30th, 2020 by the World health organization (WHO, 2020) . Agin, an adverse situation has also been announced on 13 March 2020 for increasing the infections of COVID-19 (Kunz and Minder, 2020) . Till April 10, 2020, total virus affected people around the world exceeded 1,633,272 and more than 97,601 committed death, while 366,610 people fully recovered from the infection (WHO, 2020) . The alarming situation is that the number of confirmed cases worldwide has exceeded one million by this time. It took more than three months to reach the first 10000 confirmed cases, while required only 12 days to detect the next 100000 cases. The situation is getting worse in European region. Total death cases in Italy, Spain, USA, France, United Kingdom was 14681,11744,7847,6507 and 4313 respectively (till April 4, 2020) and this number is exacerbating day by day (WHO, 2020) . Some common clinical manifestations of COVID-19 is fever, sputum production, shortness of breath, cough, fatigue, sore throat and headache which leads to severe cases of pneumonia. A few patients also have gastrointestinal symptoms with diarrhea and vomiting (Guan et al., 2019) . Though several early studies showed that the mortality rate for SARS-CoV-2 is not as high (2-3%), the latest global death rate for COVID-19 is 3.4% which indicates the increasing trends . The investigation of Chinese Center for Diseases Control and Prevention (2020) revealed that the prevalence of COVID-19 is more apparent in the people ages 50 years rather than the lower age groups (Jeong-ho et al., 2020) . High fever and Lymphocytopenia were found more common in Covid-19, though the frequency of the patient without fever condition is also higher than in the earlier outbreaks caused by SARS-CoV (1%) and MERS-CoV (2%) Chen et al., 2020) . SARS-CoV-2 is a betacoronavirus that has a positive sense, 26-32 kb in length, single stranded RNA molecule as its genetic material and belongs to the family Coronaviridae, order Nidovirales (Hui et al., 2019) . It shares genome similarity with SARS-CoV (79.5%) and bat coronavirus (96%) Zhu et al., 2020) . However, there are still obscured hypothesis regarding the vector or carrier of SARS-CoV-2, though its detection was primarily linked to Wuhan's Huanan Seafood wholesale market (Lu et al., 2020; WHO, 2020) . Though the species of SARS-CoV-1 and bat coronavirus shares sufficient sequence similarities with the COVID-19, the known way mechanism of infection to the host, and the death rate is quite different in case of the novel coronavirus. In addition, there is an evolutionary distance between SARS-CoV-1 and bat coronavirus as well as the COVID-19 (Hu et al., 2018; Wu, 2020b) . Because of high sequence variability of the pathogen, many of the efforts that have been undertaken to develop vaccine against SARS-CoV2, remain unsuccessful . Therefore, there is an urgent need to develop vaccines for treatment of SARS-CoV-2 based on the understanding of actual evolutionary ancestral relationship. While some natural metabolites and traditional medication may come up with comfort and take the edge off few symptoms of COVID-19, there is no proof that existing treatment procedures can effectively combat against the diseased condition (WHO, 2020). However, inactivated or live-attenuated forms of pathogenic organisms are usually recommended for the initiation of antigen-specific responses that alleviate or reduce the possibility of host experience with secondary infections (Thompson & Staats, 2011) . Moreover, all of the proteins are not usually targeted for protective immunity, whereas only a few numbers of proteins are necessary depending on the microbes (Tesh et al., 200, Li et al., 2014) . Depending on sufficient antigen expression from experimental assays, traditional vaccine could take 15 years to develop, while sometimes can lead to undesirable consequences (Purcell et al., 2007 , Petrovsky & Aguilar, 2004 . Reverse vaccinology approach, on the other hand, is an effective way to develop vaccine against COVID-19. In this method, computation analysis towards genomic architecture of pathogenic candidate could predict the antigens of pathogens without the prerequisite to culture the pathogens in lab condition. Although, few pathogens that challenge to develop effective vaccines so far may become possible through such approach (Rappuoli, 2000) which initiates a huge move in the development of vaccine against the deadly pathogens. The strategy included the comprehensive utilization of bioinformatics algorithm or tools to develop epitope based vaccine molecules, though further validation and experimental procedures are also needed (Moxon et al., 2019) . In addition, peptide based subunit vaccines are biologically safer due to the absence of continuous in vitro culture during the production period, and also implies an appropriate activation of immune responses (Purcell et al., 2007; Dudek et al., 2010) . Such immunoinformatic approaches have already been employed by the researchers to design vaccines against a number of deadly pathogens including Ebola virus (Khan et al., 2015) , HIV (Pandey et al., 2018) , Areanaviruses , Marburgvirus , Norwalk virus (Azim et al., 2019b) , Nipah virus (Saha et al., 2017) , influenza virus (Hasan et al., 2019b) and so on. At present, a suitable peptide vaccine against SARS-CoV-2 is urgently necessary that could efficiently generate enough immune response to destroy the virus. Hence, the study was designed to develop a chimeric recombinant vaccine against COVID-19 by targeting four major structural proteins of the pathogen, while revealing the evolutionary history of different species of coronavirus based on whole genome and protein domain-based phylogeny. Complete Genomes of the COVID-19 and other coronaviruses were retrieved from the NCBI (https://www.ncbi.nlm.nih.gov/), using the keyword 'coronavirus' and the search option 'nucleotide'. A total 61 complete genomes were retrieved, with unique identity (Supplementary File 1). Protein sequence of the spike, envelope, membrane and nucleocapsid were also retrieved from the corresponding genome sequences found in NCBI (Supplementary File 1). Tthe complete genome sequences of coronaviruses and the proteins of envelope, envelope, membrane and nucleocapsid were employed to construct different phylogenetic trees. Multiple sequence alignment (MSA) of the complete genome and protein sequences were performed using MAFFT v7.310 (Katoh & Standley, 2013) tool. For the whole genome alignment, we used MAFFT Auto algorithm, while for the protein sequences alignment, MAFFT G-INS-I algorithm was used using default parameters. Next, alignment was visualized using the JalView-2.11 (Waterhouse et al., 2009) . Alignment position with more than 50% gaps was pruned from coronavirus genome using Phyutility 2.2.6 program (Smith & Dunn, 2008) . Again, more than 20% gaps from the spike protein alignment was removed. PartitionFinder-2.1.1 (Lanfear et al., 2017) indicated the best fit substitution model of the completed genome sequences and the protein sequences. The phylogeny of the whole genome sequences of coronavirus was constructed using both the Maximum Likelihood Method and Bayesian Method. RAxML version 8.2.11 (Stamatakis, 2014 ) with the substitution model GTRGAMMAI was used using 1000 rapid bootstrap replicates. MrBayes version 3.2.6 (Ronquist et al., 2012) with INVGAMMA model was used for the corona virus genomes. Phylogenetic analyses of four different protein sequences were performed by using RAxML-8.2.11 tool. For spike and nucleocapsid proteins, we found PROTGAMMAIWAG and PROTGAMMAIWAG as the best fil model, respectively. Again, PROTGAMMAWAG was the best fit model of evolution for both the membrane and envelope proteins. For the retrieval of the domain sequences of the stated protein sequences, InterPro database (https://www.ebi.ac.uk/interpro/) was utilized. Finally, the Interactive Tree of Life (iTOL; EMBL, Heidelberg, Germany) was used for the visualization of the phylogenetic trees. All the trees were rooted in the midpoint. In the present study, reverse vaccinology technique was utilized to model a novel multiepitope subunit vaccine against 2019-nCoV. The scheme in Figure 1 represents the complete methodology that has been adopted to develop the final vaccine construct. Among 496 proteins (available in the NCBI database) from different strains of novel corona virus, four structural proteins, i.e. spike glycoprotein, membrane glycoprotein, envelope protein and nucleocapsid protein, were prioritized for further investigation (Supplementary File 2). After sequence retrieval from NCBI, the sequences were subjected to BLASTp analysis to find out the homologous protein sequences. Multiple sequence alignment was done by using Clustal Omega to identify the conserved regions (Sievers and Higgins, 2014) . The topology of each conserved regions were predicted by TMHMM Server v.2.0 (http://www.cbs.dtu.dk/services/TMHMM/), while the antigenicity of the conserved regions was determined by VaxiJen v2.0 (Doytchinova and Flower, 2007a) . Only the common fragments were used for T-Cell epitopes enumeration via T-Cell epitope prediction server of IEDB (http://tools.iedb.org/main/tcell/) (Vita et al., 2014) . Again, TMHMM server was utilized for the prediction of transmembrane topology of predicted MHC-I and MHC-II binding peptides followed by antigenicity scoring via VaxiJen v2.0 server (Krogh et al., 2001; Doytchinova and Flower, 2007b) . The epitopes which have antigenic potency were picked and used for preceding analysis. The level of conservancy scrutinizes the ability of epitope candidates to impart capacious spectrum immunity. Homologous sequence sets of the chosen antigenic proteins were retrieved form the NCBI database by utilizing BLASTp tool. Later, conservancy analysis tool (http://tools.iedb.org/conservancy/) in IEDB was used to demonstrate the conservancy level of the predicted epitopes among different viral starins. The toxicity of non-allergenic epitopes was enumerated by using ToxinPred server (Gupta et al., 2013) . Among different ethnic societies and geographic spaces, the HLA distribution varies around the world. Population coverage study was conducted by using IEDB population coverage calculation server (Vita et al., 2014) . To check the allergenicity of the proposed epitopes, four distinct servers i.e. AllergenFP , AllerTOP (Dimitrov et al., 2013) , Allermatch (Fiers et al., and Allergen Online (http://www.allergenonline.org/) servers were utilized. Three different algorithms i.e. Bepipred Linear Epitope Prediction 2.0 (Jespersen et al., 2017) , Emini surface accessibility prediction (Emini et al., 1985) and Kolaskar and Tongaonkar antigenicity scale (Kolaskar and Tongaonkar, 1990) from IEDB predicted the potential B-Cell epitopes within conserved fragments of the chosen viral proteins. Top CTL, HTL and B cell epitopes were compiled to design the final vaccine constructs in the study. Each vaccine constructs commenced with an adjuvant followed by top CTL epitopes, HTL epitopes and BCL epitopes respectively. For construction of novel corona vaccine, the chosen adjuvants i.e. L7/L12 ribosomal protein, beta defensin (a 45 mer peptide) and HABA protein (M. tuberculosis, accession number: AGV15514.1) were used (Rana and Akhter, 2016) . Several linkers such as EAAAK, GGGS, GPGPG and KK in association with PADRE sequence were incorporated to construct fruitful vaccine sequences against COVID-19. The constructed vaccines were then analyzed whether they are non-allergenic by utilizing the following tool named Algpred (Azim et al., 2019) . The most potential vaccine among the three constructs was then determined by assessing the antigenicity and solubility of the vaccines via VaxiJen v2.0 (Doytchinova and Flower, 2007b) and Proso II server (Smialowski et al., 2006) , respectively. ProtParam tool (https://web.expasy.org/protparam/), provided by ExPASy server (Hasan et al., 2019c ) was used to functionally characterize (Gasteiger et al., 2003) the vaccine constructs. The studied functional properties were isoelectric pH, molecular weight, aliphatic index, instability index, hydropathicity, estimated half-life, GRAVY values and other physicochemical characteristics. Alpha helix, beta sheet and coil structures of the vaccine constructs were analyzed through GOR4 secondary structure prediction method using Prabi (https://npsaprabi.ibcp.fr/). In addition, Espript 3.0 (Robert & Gouet, 2014) was also used to predict the secondary structure of the stated protein sequences. Vaccine 3D model was generated on the basis of percentage similarity between target protein and available template structures from PDB by using I-TASSER (Peng and Xu, 2011) . The modeled structures were further refined via FG-MD refinement server. Structure validation was performed by Ramachandran plot assessment in RAMPAGE (Hasan et al., 2019b) . By utilizing DbD2 server, probable disulfide bonds were designed for the anticipated vaccine constructs (Craig and Dombkowski, 2013) . The value of energy was considered < 2.5, while the chi3 value for the residue screening was chosen between -87 to +97 for the operation (Hasan et al., 2019b) . The B-cell epitopes of putative vaccine molecules were predicted via ElliPro server (http://tools.iedb.org/ellipro/) with minimum score 0.5 and maximum distance of 7 Å (Ponomarenko et al., 2004) . Moreover, IFN-inducing epitopes within the vaccine were predicted using IFNepitope with motif and SVM hybrid detection strategy (Hajighahramani et al., 2017). Normal mode analysis (NMA) was performed to predict the stability and large scale mobility of the vaccine protein. The iMod server determined the stability of construct V3 by comparing the essential dynamics to the normal modes of protein (Aalten et al, 1997; Wuthrich et al., 1980) . It is a recommended alternative to costly atomistic simulation (Tama and Brooks, 2006; Cui and Bahar, 2007) and shows much quicker and efficient assessments than the typical molecular dynamics (MD) simulations tools (Prabhakar et al., 2016; Awan et al., 2017) . The main-chain deformability was also predicted by measuring the efficacy of target molecule to deform at each of its residues. The motion stiffness was represented via eigenvalue, while the covariance matrix and elastic network model was also analyzed. Patchdock server was prioritized for docking between different HLA alleles and the putative vaccine molecules. In addition, the superior construct was also docked with different human immune receptors such as, ACE 3, APN, DPP4 and TLR-8.The 3D structure of these receptors were retrieved from RCSB protein data bank. Detection of highest binding affinity between the putative vaccine molecules and the receptor was experimented based on the lowest interaction energy of the docked structure. JCAT tool was utilized for codon adaptation in order to fasten the expression of vaccine construct V3 in E. coli strain K12. For this, some restriction enzymes (i.e. BglI and BglII), Rho independent transcription termination and prokaryote ribosome-binding site were put away from the work (Grote et al., 2005) . After that, the mRNA sequence of constructed V3 vaccine was ligated within BglI (401) and BglII (2187) restriction site at the C-terminal and N-terminal sites respectively. SnapGene tool was utilized for in silico restriction cloning (Solanki and Tiwari, 2018 ). In the phylogenetic analysis, we introduced different coronavirus from three different genera: (Forni et al., 2017; Zhou et al., 2020; Zumla et al., 2016) . Among these, the first five species belong to the beta coronavirus genera, while the last two belongs to the alpha genera. Apart from the human coronaviruses, we introduced other coronaviruses which choose different species of bats, whale, turkey, rat, mink, ferret, swine, camel, rabbit, cow and others as host (Supplementary Table- Domain analysis of spike protein of coronaviruses reveals that they contain mainly one signature domains namely, coronavirus S2 glycoprotein (IPR002552), which is present in all the candidates. All other betacoronavirus contains spike receptor binding protein (IPR018548), coronavirus spike glycoprotein hapted receptor 2 domain (IPR027400) and spike receptor binding domain superfamily (IPR036326). SARS-CoV-1 contains an extra domain, namely spike glycoprotein N-terminal domain (IPR032500), which is also present in some the sub-genera (Embecovirus) of Betacoronavirus, but not in COVID-19. One important finding in our study is that the COVID-19 candidates do not contain the domain spike glycoprotein (IPR042578), which is present in the SARS-CoV-1 (Figure 3) . The secondary structure prediction study shows a large numbers of cysteine residues which contribute to the formation of disulfide bonds within the spike protein. Most of them fall within the S1 spike protein, which is 654 amino acid long in SARS-CoV-1, while 672 amino acids long in COVID-19. The RGD motif which is conserved within the COVID-19 is present in the vicinity of the S1 protein. It exists as KGD that clearly demonstrates the mutation over the short time period. Again, the receptor binding domain and receptor binding motif analyses disclose variations within several region between the COVID19 and SARS-CoV-1 (Supplementary File 2). The domain-based phylogenetic analysis reflects two main divisions, where the all the novel betacoronavirus i.e., COVID19 form clade with the SARS-CoV-1; while other betacoronavirus fall in another clade which further divide to give rise different sub-genera. This clearly shows that the COVID-19 exerts specific ancestral connection to the SARS-CoV-1 in terms of spike glycoproteins. Interestingly, our study also revealed close relatedness of both the SARS-CoV-1 and COVID-19 to the bat betacoronavirus that belongs to the Hibecovirus sub-genus. However, in our study, the bat coronaviruses of Nobecovirus subgenus did not fall into the same clade of novel coronaviruses. The phylogenetic study and MSA also revealed that, the functional portion of the spike glycoprotein domain and spike glycoprotein N-terminal domain might be lost from the COVID-19 during the course of evolution. The envelope proteins of both Betacoronavirus and Alphacoronavirus contain only one protein domain (IPR003873) namely, Nonstructural protein NS3 or small envelope protein E (NS3/E). This domain is well conserved in coronavirus and also found in murine hepatitis virus. On the other hands, the gamma coronavirus shows the exception, which possess (IPR005296) IBV3C protein domain, which thought to be expressed from the ORF3C gene of infectious bronchitis virus (Jia & Naqi, 1997 (Figure 4) . In spite to the previous findings, where it was found that the envelope proteins of the MERS virus and SARS-CoV-1 exerted close proximity in terms of secondary structure and functions (Surya et al., 2015) . Unlike to earlier finding, we got that gamma corona virus candidate in our study shows close connection with both SARS-CoV-1 and COVID-19 in terms of envelope proteins. Membrane The length of nucleocapsid proteins of betacoronavirus genus ranges from 410 to 450 amino acids. Three signature domains are mainly present in the nucleocapsid proteins, which are: Coronavirus Nucleocapsid protein (IPR001218), Nucleocapsid Proteins C-terminal (IPR037179) and Nucleocapsid Proteins N-terminal (IPR037195). However, in our experiment, we didn't find these domains in HCoV-HKU1 ( Figure 6) showed that among the immunogenic conserved sequences from the corresponding proteins except spike glycoprotein met the criteria of desired exomembrane characteristics (Table 1) . A plethora of immunogenic epitopes were generated from the conserved sequences that were able to bind with most noteworthy number of HLA cells (Supplementary Table 1, Supplementary Table 2, Supplementary Table 3 and Supplementary Table 4 ). Top epitopes with exomembrane characteristics were ranked for each individual protein after investigating their antigenicity score and transmembrane topology (Table 2) . Epitopes from each protein showed high level of conservancy up to 100% (Table 2 ). ToxinPred server predicted the relative toxicity of each epitope which indicated that the top epitopes were non-toxin in nature (Supplementary Table 5 ). Population coverage of four structures proteins were also done for the predicted CTL and HTL epitopes. From the screening, results showed that population of the various geographic regions could be covered by the predicted T-cell epitopes ( Figure 7) . Finally, the allergenic epitopes were excluded from the list based on the evaluation of four allergenicity prediction server (Supplementary Table 5 ). Top B-cell epitopes were predicted for Spike glycoprotein, membrane glycoprotein, envelope protein and nucleocapsid protein using 3 distinct algorithms (i.e. Bepipred Linear Epitope prediction, Emini Surface Accessibility, Kolaskar & Tongaonkar Antigenicity prediction) from IEDB. Epitopes were also allowed to analyze their vaxijen scoring and allergenicity (Table 3) . Three putative vaccine molecules (i.e. V1, V2 and V3) were constructed, each comprising a protein adjuvant, eight T-cell epitopes, twelve B-cell epitopes and respective linkers (Supplementary Table 6 ). PADRE sequence was included to extend the efficacy and potency of the constructed vaccine. The putative vaccine constructs, V1, V2 and V3 were 397, 481 and 510 residues long respectively. However, allergenicity score of V3 (-0.89886723) revealed that it was superior among the three constructs in terms safety and efficacy. V3 also had a solubility score (0.60) ( Figure 8E ) and antigenicity (0.58) over threshold value (Table 4) . ProtParam tool was employed to analyze the physicochemical properties of V3. Figure 1) . Tertiary structure of the putative vaccine construct V3 was generated using I-TASSER server ( Figure 8A and 8B). The server used 10 best templates with highest significant (measured via Zscore) from the LOMETS threading program to model the 3D structure. After refinement, Ramachandran plot analysis revealed that 92.7% and 5.7% residues were in the favored and allowed regions respectively, while only 8 residues (1.6%) occupied in the outlier region ( Figure 8C ). The overall quality factor determined by ERRAT server was 91.56% ( Figure 8D ) Ellipro server predicted a total 6 conformational B-cell epitopes from the 3D structure of the construct V3. Epitopes No. 1 were considered as the broadest conformational B cell epitopes with 25 amino acid residues (Figure 9 and Supplementary Table 8 ). Stability of the vaccine construct V3 was investigated through mobility analysis ( Figure 10A and 10B), B-factor, eigenvalue & deformability analysis, covariance map and recommended elastic network model. Results revealed that the placements of hinges in the chain was insignificant ( Figure 10C ) and the B-factor column gave an averaged RMS ( Figure 10D ). The estimated higher eigenvalue 6.341333e -06 ( Figure 10E ) indicated low chance of deformation of vaccine protein V3. The correlation matrix and elasticity of the construct have been shown in Figure 10G and Figure 10H , respectively. The structural interaction between HLA alleles and the designed vaccines were investigated by molecular docking approach. The server detected the complexed structure by focusing on complementarity score, ACE (Atomic Contact Energy) and estimated interface area of the compound ( Table 5 ). The molecular affinity between the putative vaccine molecules V3 and several immune receptors were also experimented. The result showed that construct V3 interacted with each receptor with significantly lower binding energy (Figure 11 ). The Codon Adaptation Index (CAI) and GC content for the predicted codons of the putative vaccine constructs V1 were demonstrated as 1.0 and 51.56% respectively. An insert of 1542 bp was found which lacked the restriction sites for BglI and BglII, thus providing comfort zone for cloning. The codons were inserted into pET28a(+) vector alongside two restriction sites (BglI and BglII) and a clone of 5125 base pair was generated ( Figure 12 ). In December 2019, a new coronavirus prevalence flourished in Wuhan, China, causing clutter among the medical community, as well as to the rest of the world . The new species has been renamed as 2019-nCoV or, SARS-CoV-2, already causing considerable number infections and deaths in China, Italy, Spain, Iran, USA and to a growing degree throughout the world. The major outbreak and spread of SARS-CoV-2 in 2020 forced the scientific community to make considerable investment and research activity for developing a vaccine against the pathogen. However, owing to high infectivity and pathogenicity, the culture of SARS-CoV-2 needs biosafety level 3 conditions, which may obstructed the rapid development of any vaccine or therapeutics. It had been found that about 35 companies and academic institutions are engaged in such works (Spinney et al., 2020 , Ziady et al., 2020 . Among the potential SARS-CoV-2 vaccines in the pipeline, four have nucleic acid based designs, four involve non-replicating viruses or protein constructs, two contain live attenuated virus and one involves a viral vector (Pang et al., 2020) , while only one, called mRNA-1273 (developed by NIAID collaboration with Moderna, Inc.), has confirmed to start phase-1 trial (NIH, 2020) . However, in this study we emphasized on a different approaches by prioritizing the advantages of different genome and proteome database using the immunoinformatic approach. Computational vaccine predictions were adopted by the researchers to design vaccines against both MERS-CoV (Sudhakar et al., 2013; Fernando et al., 2013) and SARS-CoV-1 (Yang et al., 2003; Oany et al., 2014) , targeting the outer membrane or functional proteins (Sharmin and Islam, 2014) . Several in silico strategies have also been employed to predict potential T cell and B cell epitopes against SARS-CoV-2, either emphasizing on spike glycoprotein or envelope proteins (Behbahani, 2020; Rasheed et al., 2020) . None of the studies, however, focused on other structural proteins. Moreover, random genetic changes and mutations in the protein sequences (Yin, 2020) may obstruct the development of effective vaccines and therapeutics against human coronavirus in the future. Hence, the present study was employed to identify the similarity and divergence among the close relatives of the target pathogen and develop a novel chimeric recombinant vaccine considering all major structural proteins i.e. spike glycoprotein, membrane glycoprotein, envelope protein and nucleocapsid protein simultaneously. The topology of the phylogenetic trees of the whole genome and the stated four proteins sequences from different species of coronaviruses reveal that SARS-CoV-1 and bat coronaviruses are the closest homologs of the novel coronaviruses. Our results infer a significant level of similarities within the COVID-19 and SARS-CoV-1 which was also aligned with the previous findings (Jaimes et al., 2020; Wu, 2020a) . The sequence similarities between the SARS-CoV, bat coronaviruses and the COVID-19 from the reported studies (Hu et al., 2018; Wu, 2020b) suggests that those are distantly related, in spite those are capable of infecting the humans and therefore possess the adaptive convergent evolution. Interestingly, the COVID-19 envelope proteins form clade with the Turkey coronavirus which belongs to Gamma coronavirus genus. So, in terms of envelope proteins, the envelope gene of turkey coronavirus might contribute to the convergence process, which need further analysis. In addition, from the domain-based phylogeny of nucleocapsid proteins, it can be deduced that this protein might have originated in bats and was transmitted to camels and then later on choose human as the potential host. Overall, the COVID-19 might go through complex adaptation strategies in order to be transmitted into the human via different animals. The homologous protein sets for four structural proteins of Coronavirus were sorted to identify conserved regions through BLASTp analysis and MSA. Only the conserved sequences were utilized to identify potential B-cell and T-cell epitopes for each individual protein (Table 1) . Thus, our constructs are expected to stimulate a broad-spectrum immunity in host upon administration. Cytotoxic CD8+T lymphocytes (CTL) play a crucial role to control the spread of pathogens by recognizing and killing diseased cells or by means of antiviral cytokine secretion (Garcia et al., 1999) . Thus, T cell epitope-based vaccination is a unique process to confer defensive response against pathogenic candidates (Shrestha, 2004) . Approximately 800 MHC-I peptides (CTL epitopes) and 600 MHC-II peptides (HTL epitopes) were predicted via IEDB server, from which we screened the top ones through analyzing the antigenicity score, transmembrane topology, conservancy level and other important physiochemical parameters employing a number of bioinformatics tools ( Table 2 ). The top 10 epitopes from each protein was further assessed by investigating the toxicity profile and allergenicity pattern. Different servers rely on different parameters to predict the allergenic nature of small peptides. Therefore, we used 4 distinct servers for such assessment and the epitopes predicted as non-allergen at least via 3 servers were retained for further analysis (Supplementary Table 5 ). Vaccine initiates the generation of effective antibodies that are usually produced by B cells and plays effector functions by targeting specifically to a foreign particles (Cooper & Nemerow, 1984) . The potential B cell epitopes were generated by three different algorithms (Bepipred linear epitope prediction 2.0, Kolaskar and Tongaonkar antigenicity prediction and Emini surface accessibility prediction) from IEDB database (Table 3) . Suitable linkers and adjuvants were used to combine top finalized epitopes from each protein that led to develop a multi epitope vaccine molecules (Supplementary Table 6 ). As PADRE sequence was usually recommended to lessen the polymorphism of HLA molecules in the population (Ghaffari-Nazari et al., 2015) , it was also considered to construct the final vaccine molecule. Here, adjuvants would enhance the immunogenicity of the vaccine constructs and appropriate separation of epitopes in the host environment would be ensured by the linker (Yang et al., 2015) . Allergenicity, physiochemical properties, antigenicity and three-dimensional structure of vaccine constructs were characterized, and it had been concluded that V3 was superior to V1 and V2 vaccine constr. The final construct also occupied by several interferon-α producing epitopes (Supplementary Table 8 ). The vaccine protein (V3) was subjected to disulfide engineering to enhance its stability. Analysis of the normal modes in internal coordinates by iMODS was employed to investigate the collective motion of vaccine molecules (Lopez-Blanco et al., 2014) . Negligible chance of deformability at molecular level was analyzed for the putative vaccine construct V3, thereby strengthening our prediction. Moreover, molecular docking was investigated to analyze the molecular affinity of the vaccine with different HLA molecules i.e. DRB1*0101, DRB5*0101, DRB3*0202, DRB1*0401, DRB3*0101 and DRB1*0301 (Table 5) . It had been reported that a specific receptor-binding domain of CoV spike protein usually recognizes its host receptor ACE2 (angiotensin-converting enzyme 2) (Li. et al., 2003; Li, 2015) . Previous studies also identified dipeptidyl peptidase 4 (DPP4) as a functional receptor for human coronavirus (Raj et al., 2013) . Therefore, we performed another docking study prioritizing these immune receptors to strengthen our prediction ( Figure 11 ). Results showed that the designed construct bound with the selected receptors with minimum binding energy which was biologically significant. Finally, in-silico restriction cloning was adopted to check the suitability of construct V3 for entry into pET28a (+) vector and expression in E. coli strain K12 (Figure 12 ). Traditional ways to vaccine development are time consuming and laborious. Moreover, the result may not be always as expected or fruitful (Stratton et al., 2003; Hasan et al., 2019) . In silico prediction and prescreening methods, on the contrary, offer some advantages while saving time and cost for production. Therefore, the present study may aid in the development of preventive strategies and novel vaccines to combat infections caused by 2019-nCoV. However, further wet lab trials involving model organism needs to be experimented for validating our findings. The darker the greys, the stiffer the springs (H). A Comparison of Techniques for Calculating Protein Essential Dynamics Mutation-structure function relationship 893 based integrated strategy reveals the potential impact of deleterious missense mutations in 894 autophagy related proteins on hepatocellular carcinoma (HCC): a comprehensive 895 informatics approach Conglomeration of highly antigenic nucleoproteins to inaugurate a heterosubtypic next generation vaccine candidate against Arenaviridae family. bioRxiv Immunoinformatics approaches for designing a novel multi epitope peptide vaccine against human norovirus In silico Design of novel Multi-epitope recombinant Vaccine based on Coronavirus Spike glycoprotein Genomic variance of the 2019-nCoV coronavirus Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Pathogenicity and transmissibility of 2019-ncov-a quick overview and comparison with other emerging viruses The role of antibody and complement in the control of viral infections Disulfide by Design 2.0: A web-based tool for disulfide engineering in proteins normal mode analysis theoretical and applications to biological and chemical systems AllerTOP v.2 -A server for in silico prediction of allergens AllergenFP: Allergenicity prediction by descriptor fingerprints Identifying candidate subunit vaccines using an alignmentindependent method based on principal amino acid properties A server for prediction of protective antigens, tumour antigens and subunit vaccines Epitope discovery and their use in peptide based vaccines Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide Engineering a replication-competent, propagation defective middle east respiratory syndrome coronavirus as a vaccine candidate Allermatch™, a webtool for the prediction of potential allergenicity according to current FAO/WHO Codex alimentarius guidelines Molecular Evolution of Human Coronavirus Genomes Structural basis of T cell recognition Improving Multi-Epitope Long Peptide Vaccine Potency by Using a Strategy that Enhances CD4+T Help in BALB/c Mice A decade after SARS: strategies for controlling emerging coronaviruses JCat: A novel tool to adapt codon usage of a target gene to its potential expression host Clinical characteristics of coronavirus disease 2019 in China Silico Approach for Predicting Toxicity of Peptides and Proteins Vaccinomics strategy for developing a unique multi-epitope monovalent vaccine against Marburg marburgvirus. Infection, Genetics and Evolution Contriving a chimeric polyvalent vaccine to prevent infections caused by Herpes Simplex Virus (Type-1 and Type-2): an exploratory immunoinformatic approach Reverse vaccinology approach to design a novel multi-epitope subunit vaccine against avian influenza A (H7N9) virus, Microbial pathogenesis Genomic characterization and infectivity of a novel SARS-like coronavirus in Chinese bats. Emerging Microbes and Infections Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health-The latest 2019 novel coronavirusoutbreak in Wuhan Structural modeling of 2019-novel coronavirus (nCoV) spike protein reveals a proteolytically-sensitive activation loop as a distinguishing feature compared to SARS-CoV and related SARS-like coronaviruses Chinese scientists race to develop vaccine as coronavirus death toll jumps". South China Morning Post BepiPred-2.0: Improving sequence-based Bcell epitope prediction using conformational epitopes Sequence analysis of gene 3, gene 4 and gene 5 of avian infectious bronchitis virus strain CU-T2 MAFFT multiple sequence alignment software version 7: Improvements in performance and usability Epitope-based peptide vaccine design and target site depiction against Ebola viruses: an immunoinformatics study A semi-empirical method for prediction of antigenic determinants on protein antigens Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes COVID-19 pandemic: palliative care for elderly and frail patients at home and in residential and nursing homes Partitionfinder 2: New methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses Receptor recognition mechanisms of coronaviruses: a decade of structural studies Peptide vaccine: progress and challenges Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus iMODS: internal coordinates 891 normal mode analysis server Outbreak of Pneumonia of Unknown Etiology in Wuhan China: the Mystery and the Miracle Subcellular location and topology of severe acute respiratory syndrome coronavirus envelope protein NIH clinical trial of investigational vaccine for COVID-19 begins. Study enrolling Seattle-based healthy adult volunteers. Accessed on Design of an epitope-based peptide vaccine against spike protein of human coronavirus: an in silico approach. Drug design, development and therapy Immunoinformatics approaches to design a novel multi-epitope subunit vaccine against HIV infection. Vaccine Potential rapid diagnostics, vaccine and therapeutics for 2019 novel coronavirus (2019-nCoV): a systematic review Exploiting structure information for protein alignment by statistical inference Vaccine adjuvants: current state and future trends. Immunology and cell biology Monomerization alters the dynamics 897 of the lid region in campylobacter jejuni CstII: an MD simulation study More than one reason to rethink the use of peptides in vaccine design Dipeptidyl peptidase 4 is a functional receptor for the emerging human coronavirus-EMC Reverse vaccinology Silico Identification of Novel B Cell and T Cell Epitopes of Wuhan Coronavirus (2019-nCoV) for Effective Multi Epitope-Based Peptide Vaccine Production Deciphering key features in protein structures with the new ENDscript server Mrbayes 3.2: Efficient bayesian phylogenetic inference and model choice across a large model space In silico identification and characterization of common epitope-based peptide vaccine for Nipah and Hendra viruses. Asian Pacific journal of tropical medicine A highly conserved WDYPKCDRA epitope in the RNA directed RNA polymerase of human coronaviruses can be used as epitope-based universal vaccine design Role of CD8+ T cells in control of West Nile virus infection Clustal omega, accurate alignment of very large numbers of sequences Protein solubility: Sequence based prediction and experimental verification Phyutility: A phyloinformatics tool for trees, alignments and molecular data Subtractive proteomics to identify novel drug targets and reverse vaccinology for the development of chimeric vaccine against Acinetobacter baumannii When will a coronavirus vaccine be ready?". The Guardian. Retrieved RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies Immunization safety review: vaccinations and sudden unexpected death in infancy Platform strategies for rapid response against emerging coronaviruses: MERS-CoV serologic and antigenic relationships in vaccine design Potential factors influencing repeated SARS outbreaks in China COVID-19: Epidemiology, Evolution, and Cross-Disciplinary Perspectives MERS coronavirus envelope protein has a single transmembrane domain that forms pentameric ion channels Symmetry, form, and shape: guiding principles for robustness in macromolecular machines Efficacy of killed virus vaccine, live attenuated chimeric virus vaccine, and passive immunization for prevention of West Nile virus encephalitis in hamster model. Emerging infectious diseases Cytokines: the future of intranasal vaccine adjuvants. Clinical and Developmental Immunology The immune epitope database (IEDB) 3.0 A novel coronavirus outbreak of global health concern. The Lancet Jalview Version 2-A multiple sequence alignment editor and analysis workbench Coronavirus disease (COVID-19) outbreak Infection prevention and control during health care when COVID-19 is suspected: interim guidance Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72314 cases from the Chinese Center for Disease Control and Prevention A new coronavirus associated with human respiratory disease in China Strong evolutionary convergence of receptor-binding protein spike between COVID-19 and SARS-related coronaviruses Strong evolutionary convergence of receptor-binding protein spike between COVID-19 and SARS-related coronaviruses Correlations between internal mobility and stability of globular proteins An evolutionary RGD motif in the spike protein of SARS-CoV-2 may serve as a potential high risk factor for virus infection ? In silico design of a DNA-based HIV-1 multiepitope vaccine for Chinese populations A DNA vaccine induces SARS coronavirus neutralization and protective immunity in mice Genotyping coronavirus SARS-CoV-2: methods and implications Atomic-Level Protein Structure Refinement Using 857 Fragment-Guided Molecular Dynamics Conformation Sampling A pneumonia outbreak associated with a new coronavirus of probable bat origin Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2 A novel coronavirus from patients with pneumonia in China Biotech company Moderna says its coronavirus vaccine is ready for first tests Coronaviruses-drug discovery and therapeutic options Authors would like to acknowledge the Department of Biochemistry and Chemistry, Department of Microbial Biotechnology and Department of Pharmaceuticals and Industrial Biotechnology of Sylhet Agricultural University for the technical support of the project. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Authors declare that they have no conflict of interests. Supplementary Table 1 : Predicted CTL and HTL epitopes of spike glycoprotein.