key: cord-1046732-ccd5w05f authors: Sarma, Vyshnavie R.; Olotu, Fisayo A.; Soliman, Mahmoud E.S. title: Integrative immunoinformatics paradigm for predicting potential B-cell and T-cell epitopes as viable candidates for subunit vaccine design against COVID-19 virulence date: 2021-05-18 journal: Biomed J DOI: 10.1016/j.bj.2021.05.001 sha: 8abb713bad2aa5522d3e566ae36c811d4e8b465a doc_id: 1046732 cord_uid: ccd5w05f BACKGROUND: The increase in global mortality rates from SARS-COV2 (COVID-19) infection has been alarming thereby necessitating the continual search for viable therapeutic interventions. Due to minimal microbial components, subunit (peptide-based) vaccines have demonstrated improved efficacies in stimulating immunogenic responses by host B- and T-cells. MATERIALS AND METHODS: Integrative immunoinformatics algorithms were used to determine linear and discontinuous B-cell epitopes from the S-glycoprotein sequence. End-point selection of the most potential B-cell epitope was based on highly essential physicochemical attributes. NetCTL-I and NetMHC-II algorithms were used to predict probable MHC-I and II T-cell epitopes for globally frequent HLA-A*O2:01, HLA-B*35:01, HLA-B*51:01 and HLA-DRB1*15:02 molecules. Highly probable T-cell epitopes were selected based on their high propensities for C-terminal cleavage, transport protein (TAP) processing and MHC-I/II binding. RESULTS: Preferential epitope binding sites were further identified on the HLA molecules using a blind peptide-docking method. Phylogenetic analysis revealed close relativity between SARS-CoV-2 and SARS-CoV S-protein. LALHRSYLTPGDSSSGWTAGAA(242→263) was the most probable B-cell epitope with optimal physicochemical attributes. MHC-I antigenic presentation pathway was highly favourable for YLQPRTFLL(269-277) (HLA-A*02:01), LPPAYTNSF(24-32) (HLA-B*35:01) and IPTNFTISV7(14-721) (HLA-B*51:01). Also, LTDEMIAQYTSALLA(865-881) exhibited the highest binding affinity to HLA-DR B1*15:01 with core interactions mediated by IAQYTSALL(870-878). COVID-19 YLQPRTFLL(269-277) was preferentially bound to a previously undefined site on HLA-A*02:01 suggestive of a novel site for MHC-I-mediated T-cell stimulation. CONCLUSION: This study implemented combinatorial immunoinformatics methods to model B- and T-cell epitopes with high potentials to trigger immunogenic responses to the S protein of SARS-CoV-2. Coronavirus is a pneumonia-related outbreak that intensifies from a milder to a more severe situation. This deadly virus belongs to the family Coronaviridae known to possess a positivesense, single-stranded polyadenylated RNA virus, more likely to affect humans and animals. Coronaviruses have been identified in several avian hosts as well as in various mammals, including camels; bats, masked palm civets, mice, dogs, and cats. [1, 2] At present, the novel Coronavirus SARS-CoV-2 (2019-nCoV) which has engendered a global panic reportedly originated from the food market in central China metropolis. This, has, in turn, accounted for severe epidemic outbreaks in other provinces of Mainland China, which has further spread to 27 other countries. There are currently rapid increases in the death rates caused by the irate novel coronavirus strain. [3, 4] According to research investigations in the Laboratory of Biosafety, National Institute for Viral Disease Control and Prevention, 2019-nCoV is structurally different from SARS-CoV which J o u r n a l P r e -p r o o f underlies its identification as a novel host-infecting beta coronavirus with a genome size that ranges from 26 to 32 kilobase in length. [5, 6] The phylogenetic studies of the coronavirus indicate that bats might be the original host of this virus. [7] The Coronavirus consists of the following proteins; S-Spike Proteins, Membrane proteins and the Nucleocapsid (N) Proteins ( Figure 1 ). These proteins play several crucial roles in the pathogenesis, infection and transmission of the virus in humans. The spike glycoprotein (S) of coronavirus is cleaved into two subunits (S1 and S2). The S1 subunit helps in receptor binding and the S2 subunit facilitates membrane fusion. The spike glycoproteins of coronaviruses are important determinants of tissue tropism and host range. In addition, the spike glycoproteins are critical targets for vaccine development. N is the only protein that functions primarily to bind to the CoV RNA genome, making up the nucleocapsid. [8] [9] [10] Although N is largely involved in processes relating to the viral genome, it is also involved in other aspects of the CoV replication cycle and the host cellular response to viral infection. The M protein is the most abundant structural protein and defines the shape of the viral envelope. It is also regarded as the central organizer of viral assembly, interacting with all other major SARS-CoV-2 structural proteins. [11] The SARS-CoV-2 transmits from human-to-human through close contact especially through viral droplets from sneezing and coughing. The symptoms of the viral disease include high fever, dry cough, and breathing difficulties. Virus replication and reproduction occur, as estimated, in a proportion of 3 to 5 i.e. the virus infects 3 to 5 people per established infection even during the incubation period. Other research groups have estimated the basic reproduction number between 1.4 and 3.8. More so, it has been established that the virus can transmit along a chain of at least four people. [12] J o u r n a l P r e -p r o o f Researchers are making assiduous attempts to identify effective treatments for the disease, and currently, Remdesivir and Chloroquine, have been reportedly used in clinical trials to treat patients against COVID-19. Regardless, there is still a need for continued efforts to design strong vaccines to curtail viral spread. Peptide-based vaccines have been promising for treatments against the pathogenic-virulence since it contains minimal components of infectious microbe. This makes it sufficient to effectively trigger immunogenic responses mediated by B-cells and Tcells [13] Peptide-based vaccines are safer amongst other vaccine types due to highly minimal allergic and toxic properties. This explains the implementation of numerous studies aimed at peptide-vaccine design. [14, 15] Relatively, immunoinformatics approaches have majorly contributed to the identification of potential vaccine candidates against microbial diseases, by enabling the prediction of highly probable B-cell and T-cell epitopes. [16, 17] Immunoinformatics methods incorporate multiple algorithms that assist the predictions of highly potential B-cell and T-cell Epitopes that are essential for peptide-vaccine construction. High promising B-cell epitopes are selected based on inherent physicochemical properties such as flexibility, surface-exposure/accessibility, hydrophilicity, and antigenicity. More so, predictions of peptide MHC-I/II binding affinities, proteasome C-terminal cleavage and TAP transport efficiency are essential for identifying the most potential T-cell epitopes for MHC-I and II molecules. [18, 19] Therefore, our aim in this study is centered on the use of immunoinformatics methodologies to identify highly potential antiviral peptides (B-cell epitopes and T-cell epitopes) to impede the pathogenic process of SARS-CoV-2. We believe findings from this study will contribute vitally to the vaccine development researches relative to COVID-19 treatment. J o u r n a l P r e -p r o o f Flowchart presented in Figure 2 summarizes the paradigmatic approaches employed in this study to identify highly potential B-cell and T-cell epitopes as vaccine candidates for curtailing the pathogenicity of SARS-CoV-2. Incorporated methodologies are subsequently elaborated. Viral Zone, a database of ExPASy Bioinformatics Resource Portal was utilized to retrieve information such as the host, transmission, ailment, genus, family, genome, and proteome of the virus. [20] The reviewed S-protein sequences of human coronavirus strains (HCOV-229E, HCOV-NL63, HCOV-HKU1, HCOV-EMC, and HCOV-OC43, and the SARS-CoV-2) were obtained as presented in Table 1 . The primary sequence of SARS-CoV-2 (QHD43416.1) was obtained from NCBI (National Center for Biotechnology Information) database while the protein sequences of other strains were retrieved from the UniProtKB database in FASTA format for further analysis. The analysis of evolutionary divergence was performed using the Mega7.0 software which was further represented as a Phylogenetic tree. The phylogenetic tree was schemed using a distance of 0.10 with default parameters [21, 22] . Furthermore, the selected protein sequence (QHD43416.1) was subjected to secondary and tertiary structural analyses. The secondary structure of the protein was studied using the SOPMA (Self optimized prediction method) algorithm which helped identify the Alpha Helix, Beta Sheet, and coils of the structure. [23] Furthermore, the structure of the SARS-CoV-2 spike glycoprotein (QHD43416.1) was modeled using the I-Tasser (Iterative Threading Assembly Refinement) algorithm, which entailed replica-J o u r n a l P r e -p r o o f exchange Monte Carlo simulations. This enabled the prediction and modeling of protein structures via an exhaustive search method appropriate for identifying the most matching protein template. Herein, the PDB protein 5X58A was identified and used as a template [24] with a zscore of 15.24 indicative of a considerable degree of accuracy. The structural model was validated using the Rampage tool. [25] B-cell epitope predictions were performed for constituent linear and discontinuous (conformational) epitopes. The protein sequence (QHD43416.1) was initially subjected to linear B-cell epitope prediction using Ellipro algorithm (combines Thornton's method with a residue clustering algorithm, the MODELLER program, and the JMOL viewer) to identify both the linear and the conformational epitopes where the predictive threshold was set to a minimum of 0.7. In this study, a predictive threshold of 1.000 was set to predict the physicochemical properties of the linear B-cell epitopes. These attributes were defined using the IEDB-integrated Karplus and Schulz flexibility [26] , Kolaskar & Tongaonkar Antigenicity [27] , Parker Hydrophilicity [28] and Emini Surface accessibility methods [29] . These properties cumulatively account for the immunogenic tendencies of B-cell epitopes. [30] Moreover, the discontinuous epitopes were also predicted from the secondary structures of the antigenic protein based on their protrusion indices (PI), which indicate conformational protrusion. In other words, PI provides a simplistic way of detecting those regions of the protein that bulge from the protein's surface with B-cell recognition potentials. Residues with high protrusion index values are often associated with antigenic sites [31] . Allergenicity of the linear B-cell epitope was evaluated using the Algpred method which integrates support machine vector, motif-based and BLAST-search algorithms, to predict J o u r n a l P r e -p r o o f whether or not a particular epitope is an allergen or non-allergen with a reported accuracy of 85%. This makes Algpred tool an exceptionally valuable tool for cross-reactivity prediction of allergens [32] . The NetCTL 1. binding, proteosomal C-terminal cleavage, and TAP transport efficiency. The respective parameters employed for this analysis were set at threshold 0.9 to enhance sensitivity and specificity. [19] This allowed us to identify more potential epitopes for further analysis. A combined algorithm of MHC-I binding, TAP transport efficiency, and proteasomal cleavage efficiency was selected to predict overall scores. [17, 33] These describe the crucial stages of the antigenic presentation pathway. Overall, we performed HLA-T-cell epitope binding prediction for MHC-1 molecules (Human Leukocyte Antigens; HLA-A*02:01, HLA-B*35:01 and HLA-B*51:01) which were selected based on their high global frequency [34] . The most probable potential ligands for these MHC-I molecules were identified (<-E) and presented accordingly. [35] [36] [37] [38] J o u r n a l P r e -p r o o f CD4+ T-cell epitope prediction was carried out using NetMHC II 2.3; a method that incorporates Artificial Neural Network (ANN) algorithm for binding core and affinity predictions. The parameters employed for NetMHC-II.2.3 were set at a threshold value of 0.7 to maintain high sensitivity and specificity value. T-cell epitope binding prediction was performed for MHC-II molecule; HLA DRB*15:01. [39] The crystal structure for HLA-DRB*15:01 was obtained from PDB with ID 1BX2, for peptide-protein docking studies. [40] Furthermore, the most probable T-cell epitopes (9-mer) were identified and their corresponding 3D structures were modeled using the PEPFOLD3 algorithm. The prediction method utilized a simulation run of 200ns in addition to a sOPEP energy function, which enabled the sampling of multiple conformations predicted. [41] The pep-ATTRACT method was further utilized to model interactions between the predicted peptides (T-cell epitopes) and HLA molecules using a blind docking approach. This method performs a rigid body global search on the surface of the target protein and also identifies the most appropriate sites for binding. [42] This was more suitable to determine the most preferential binding regions for the epitopes on HLA-A*02:01, HLA-B*35:01, HLA-B*51:01 and HLA-DRB*15:01. The best protein-peptide complexes were ranked based on global energy scores [43] and the docking results are presented accordingly. J o u r n a l P r e -p r o o f Amino acid sequences for the SARS-CoV-2 S-protein was retrieved from the NCBI database with entry QHD43416.1, in addition to the primary sequences of other coronavirus strains (Table 1) . Furthermore, the phylogenetic analysis revealed disparities between SARS-CoV-2 and other coronavirus strains throughout evolution. Sequences of the respective spike proteins were mapped out across the selected coronavirus strains and depicted as a phylogenetic tree (Supplementary Figure S1 ). As shown, results highlighted the close relativity between SARS coronavirus (SARS-CoV) and SARS-CoV-2. In addition, the secondary structure of the SARS-CoV-2 consists of 1273 amino acids and as estimated, 364 amino acids (28.59%) of the protein were helical, the extended β strand comprises 296 amino acids (23.25%) while 570 amino acids (44.78%) constituted the random coil region of the protein (Supplementary Figure S2 ). The selection of the 3D structure was based on the obtained C-Scores (confidence score), which is in the normal range of (-5 → +2). [44] Accordingly, the model with the highest C-score (-1.52) was selected ( Figure 3) . Also, about 1043 residues were located in the favoured region (82.1%) while those in the allowed numbered up to 195 (15.3%) with about 2.6% (33 residues) constituting the outliers. Taken together, a considerable degree of correctness can be presumed for our model since about 97.7% of residues of the predicted model lie within the favoured and allowed regions. As earlier stated, the 3-D structure of the antigenic spike protein was employed to predict conformational or discontinuous (non-linear) epitopes. Based on PI, predicted non-linear epitopes are represented in Table 3 while their respective positions on the 3-D structure are shown in Figure 5 . J o u r n a l P r e -p r o o f The T-cell epitopes were predicted using the NetCTL-I and NetMHC-II to identify potential T- 3D structures of the selected T-cell epitopes as modeled by the PEP-FOLD3 server are presented in Figure 6 . The interacting core region (9mer) of the 15mer T cell epitope of MHC-II DRB*15:01 was also modeled. A blind docking approach was employed to investigate the mechanisms of interactions between the predicted T-cell epitopes and selected HLAs of MHC classes I and II. This was an important method since it was suitable to identify regions on the HLA molecules where the epitopes would preferentially bind based on affinity and complementarity. Hence, the pepATTRACT method was sufficient to identify the most appropriate binding were bound preferentially to the hydrophobic patches similar to the ones experimentally identified by X-ray crystallography in previous studies by Yanaka et al., [37] and Pieper et al. The need for novel and highly effective treatments to evade SARS-CoV-2 virulence is highly urgent to help curtail the global pandemic. [45] Although the information on its treatment and management are still elusive, remdesivir and chloroquine are currently being tested for their efficacies since they are most likely to interfere with viral entry and replication in host cells. [46] Vaccines are important treatment modalities since can stimulate immunogenic responses against foreign antigens of the virus in the course of its pathogenesis. Since information on the cellular components of the novel coronavirus is available, the design of highly effective peptide or subunit vaccines is achievable, hence the importance of implementing immunoinformatics methods for predicting highly potential viral T-cell and B-cell epitopes. [47] This approach has been previously used to identify potential T-cell and B-cell epitopes for peptide design against the Zika virus [48] , Dengue [49] , Chikungunya [50] , EBV [51] , Ebola Virus [52] and HIV-1. Proteomic studies on the components of SARS-CoV-2 have revealed various antigenic proteins that perform diverse roles that are crucial to the infectious viral cycle; from viral entry to replication. These components include the spike glycoprotein (S), nucleocapsid, envelope protein, membrane protein, and hemagglutinin-esterase dimer protein (HE). Crucial to viral pathogenesis is the spike glycoprotein which serves as the first point of call for viral entry and attachment to host cells [8, 59] . This underlies our rationale and implementation of vaccinomics techniques as performed in this study, complementary to other available data in this regard. Characteristic epitopic attributes such as antigenicity, hydrophilicity, surface-exposure, surface accessibility among others are essential for B-cell receptor binding and recognition which is essential for provoking B-cell mediated immune responses. These factors were therefore considered for predicting potential linear B-cells epitopes for SARS-CoV-2 S-glycoprotein. predicted, which are peculiar to B-cell epitopes. Surface-exposure was also an important attribute common to predicted discontinuous/conformational epitopes, which interestingly overlapped with the linear epitope further validating its potentials as a B-cell epitope ( Figure 4 / Table 3) . Noteworthy, the majority of residues that constitute the predicted epitopes are hydrophilic with large and aromatic side chains corroborative of the predicted surface-accessibility and immunogenicity. [60] J o u r n a l P r e -p r o o f The prediction of T-cell epitopes was further employed to identify 9-mer peptides that are antigenic with innate ability to initiate the activation of CD8 T-cells. Herein, we investigated epitope binding to human MHC-I and MHC-II molecules that are globally frequent; HLA- To this effect, we implemented a blind peptide-docking approach wherein existing binding site information was not considered in the course of complex preparation. Rather, the most probable epitopes predicted were allowed to attach preferentially, without restraints, to sites on the MHC molecules. Our findings revealed that the S protein T cell epitope YLQPRTFLL 269-277 was preferentially bound to a novel site on HLA-A*02:01, which is adjacent to a previously characterized site. This could represent a novel site for the design of therapeutic T-cell stimulants for HLA-A supertypes relative to impeding SARS-CoV-2 virulence. Further analyses of interaction mechanisms revealed the roles Leu277, Tyr269, Arg273, and Pro272 in stabilizing the epitope at the highaffinity site. However, epitope binding to the HLA-B molecules occurred at the same binding cleft that has been previously defined in studies by Yanaka et al., [37] and Pieper et al., [38] which further validate the correctness of the blind-peptide docking approach employed. In HLA-B*35:01, the binding and stability of the predicted epitope LPPAYTNSF 24-32 was enhanced by Phe32, Asn30, Thr29, Tyr28, and Pro25. Also, Thr716, Asn717, Ile720, and Val722 Although, no effective treatment option has been discovered, we propose the viability of a peptide vaccine designed from B-and T-cell epitopes derived from the viral spike (S) protein. In this study, we implemented multiple algorithms to identify highly probable B-and T-cell epitopes for antigenic SARS-CoV-2 S-protein which is crucial for attachment and entry into host cells. Linear and discontinuous (non-linear) epitopes were ranked and predicted using multiple algorithms from the IEDB, which carried out its selection based on the inherent physicochemical attributes of the epitopes. Accordingly, flexibility, surface accessibility/exposure, hydrophilicity and antigenicity were considered for B-cell epitope prediction. Most probable CD4 and CD8 T-cell epitopes were also predicted, particularly their binding propensities to MHC-I and MHC-II molecules of the HLA-A, HLA-B and HLA-DRB1 supertypes. These predictions were as well performed by taking into consideration the antigenic presentation pathways. Using a blind peptide docking approach, a novel site was identified for the selective binding of Findings from this study indicate that B-and T-cells predicted in this study are highly probable which presents them as viable candidates for developing peptide-vaccines relative to COVID-19 treatment. Corresponding amino acid sequences, as predicted, are also shown (cyan highlights). J o u r n a l P r e -p r o o f Understanding the latest human coronavirus threat Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding A pneumonia outbreak associated with a new coronavirus of probable bat origin A Novel Coronavirus Emerging in China -Key Questions for Impact Assessment Human coronaviruses associated with upper respiratory tract infections in three rural areas of Ghana Human Coronaviruses: A Review of Virus-Host Interactions Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan Mechanisms of coronavirus cell entry mediated by the viral spike protein Spike protein fusion peptide and feline coronavirus virulence The coronavirus nucleocapsid is a multifunctional protein Membrane binding proteins of coronaviruses Pattern of early human-to-human transmission of Wuhan 2019-nCoV Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus (2019-nCoV) in vitro Peptide-Based Vaccines: Current Progress and Future T-cell epitope vaccine design by immunoinformatics Vaccinomics and a new paradigm for the development of preventive vaccines against viral infections. Omi A Immunoinformatics and epitope prediction in the age of genomic medicine Fundamentals and Methods for Tand B-Cell Epitope Prediction Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction ViralZone: A knowledge resource to understand virus diversity an Introduction To Molecular Phylogenetic Analysis In silico characterization of pectate lyase protein sequences from different source organisms Significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments Prefusion structure of a human coronavirus spike protein A unified platform for automated protein structure and function prediction Prediction of chain flexibility in proteins -A tool for the selection of peptide antigens A semi-empirical method for prediction of antigenic determinants on protein antigens New Hydrophilicity Scale Derived from High-Performance Liquid Chromatography Peptide Retention Data: Correlation of Predicted Surface Residues with Antigenicity and X-ray-Derived Accessible Sites Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide Determinants of antigenicity and specificity in immune response for protein sequences A new structure-based tool for the prediction of antibody epitopes AlgPred: Prediction of allergenic proteins and mapping of IgE epitopes Epitope-based vaccine target screening against highly pathogenic MERS-CoV: An In Silico approach applied to emerging infectious diseases HLA-A , B and DRB1 allele and haplotype frequencies in volunteer bone marrow donors from the north of Parana State Structural basis for the killing of human beta cells by CD8 + T cells in type 1 diabetes Nonstandard Peptide Binding Revealed by Crystal Structures of HLA-B*5101 Complexed with HIV Immunodominant Epitopes Peptide-dependent conformational fluctuation determines the stability of the human leukocyte antigen class I complex Memory T cells specific to citrullinated α-enolase are enriched in the rheumatic joint HLA polymorphism of the Zhuang population reflects the common HLA characteristics among Zhuang-Dong Crystal structure of HLA-DR2 (DRA*0101, DRB1*1501) Complexed with a peptide from human myelin basic protein PEP-FOLD: An updated de novo structure prediction server for both linear and disulfide bonded cyclic peptides The pepATTRACT web server for blind , large-scale peptide -protein docking Fully Blind Peptide-Protein Docking with pepATTRACT The I-TASSER suite: Protein structure and function prediction Disease caused by the novel coronavirus officially has a name Prophylactic and therapeutic remdesivir (GS-5734) treatment in the rhesus macaque model of MERS-CoV infection Design of an epitope-based peptide vaccine against spike protein of human coronavirus: An in silico approach From ZikV genome to vaccine: in silico approach for the epitope-based peptide vaccine against Zika virus envelope glycoprotein Analysis of viral diversity for vaccine target discovery Epitope characterization and docking studies on Chikungunya viral Envelope 2 protein Immunoinformatics prediction of potential B-cell and T-celll epitopes as effective vaccine candidates for eliciting immunogenic responses against Epstein-Barr virus In silico-based vaccine design against Ebola virus glycoprotein Conserved HIV Epitopes for an Effective HIV Vaccine Prediction of Epitopes of Viral Antigens Recognized by Cytotoxic T Lymphocytes as an Immunoinformatics Approach to Anti-HIV/AIDS Vaccine Design A comprehensive in silico analysis for identification of therapeutic epitopes in HPV16, 18, 31 and 45 oncoproteins Recent advances in antigen processing and presentation Recent Advances in Subunit Vaccine Carriers. Vaccines The proteasome and MHC class I antigen processing A new coronavirus associated with human respiratory disease in China BepiPred-2.0: Improving sequence-based B-cell epitope prediction using conformational epitopes Defining CD8 + T Cell Determinants during Human Viral Infection in Populations of Asian Ethnicity The authors thank the School of Health Sciences, University of KwaZulu Natal for infrastructural support. The Authors declare none. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:J o u r n a l P r e -p r o o f