key: cord-0056455-fm65ubuj authors: Khan, Fariya; Kumar, Ajay title: An integrative docking and simulation-based approach towards the development of epitope-based vaccine against enterotoxigenic Escherichia coli date: 2021-02-18 journal: Netw Model Anal Health Inform Bioinform DOI: 10.1007/s13721-021-00287-6 sha: a22e1709608ed8c9534bab8600d5efdbf583e4e8 doc_id: 56455 cord_uid: fm65ubuj Enterotoxigenic E.coli is causing diarrheal illness in children as well as adults with the majority of the cases occurring in developing countries. To reduce the number of cases occurring worldwide, the development of an effectual vaccine against these bacteria can be the only prevention. This conjectural work was performed using modern bioinformatics tools for investigation of proteome of ETEC strain E24377A. Different computational vaccinology approaches were deployed to assess several parameters including antigenicity, allergenicity, stability, localization, molecular weight and toxicity of the predicted epitopes required for good vaccine candidate to elicit immune response against diarrhea. We estimated two known control antigens, epitope (141)STLPETTVV(149) of Hepatitis B virus and epitope (265)ILRGSVAHK(273) of H1N1 Nucleoprotein in an attempt to corroborate our research work. Furthermore molecular docking was performed to evaluate the interaction between HLA allele and peptide, the peptide QYGGGNSAL and peptide LPYFELRWL were considered to be the most promiscuous T cell epitopes with the highest binding energy value of −2.09 kcal/mol and −1.84 kcal/mol, respectively. In addition, dynamic simulation revealed good stability of the vaccine construct as well as population coverage analysis exhibits the highest population coverage in the regions of East Asia, India, Northeast Asia, South Asia and North America. Therefore, these two epitopes can be further synthesized for wet lab analysis and could be considered as a promising vaccine against diarrhea. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13721-021-00287-6. Enterotoxigenic Escherichia coli (ETEC) bacteria are the major cause of diarrheal disease among children and adults globally. It was estimated that ETEC cause 280-400 million diarrheal cases in children under 5 years of age and 100 million cases in children above 5 years annually (WHO 2018) . Diarrheal infection is primarily caused by the ingestion of contaminated food or water that allows to enter the strain into body (Seo et al. 2019) . ETEC bacterium colonizes intestinal epithelial cells and releases two enterotoxins, heat-labile toxin (LT) and heat-stable toxin (ST) that results in the disruption of fluid and electrolyte homeostasis leading to fluid hypersecretion and watery diarrhea (Huang et al. 2018 ). The first cause of enterotoxigenic E. coli diarrhea was observed in 1956 in Kolkata and from then it continues to be a global killer in countries like Dhaka, Bangladesh, U.S. and India (Isidean et al. 2011) . ETEC diarrhea is also referred to as Traveler's diarrhea as it is more vulnerable to travelers who are travelling to developing countries and it remains to be endemic whole year but mostly in warm season (Qadri et al. 2005) . There is no vaccine for ETEC diarrhea till date; therefore, developing an ETEC vaccine could induce broadspectrum immune response against ETEC strains. From the previous literature, it depicts that ETEC vaccine has been proven to be challenging in the past few years and primarily all the research studies have engrossed on known virulence factors, mainly CF/CS antigens and heat-labile toxin (Holmgren et al. 2017) . However, comprehensive studies indicate the mode of ETEC infection is beyond these antigens and a new approach targeting multiple antigens apart from these classical known antigens could provide significant protection (Zhang and Sack 2015) . Antibiotic treatment is not suggested for diarrhea since it can lead to antibiotic resistance in ETEC as well as costly; therefore, it has become one of the major public health problems all over the world (Shaheen et al. 2003) . ETEC has always been associated with post-diarrheal long-term effect causing malnutrition, growth stunting and impaired cognitive development and therefore remains the leading cause among the children (Chakraborty et. al 2019) . All the research work has focused to target known proteins, that is, involve in causing bacteria and few previous studies also depicted common proteins between shared between two or three strains using Omics data technology (Mehla and Ramana 2016) . But the use of upgraded tools for docking and simulation with better accuracy has led towards the identification of unknown proteins of strain ETEC E24377A with better result. In the last few years, reports suggest the occurrence of many diseases due to the several outbreaks of different viruses and bacterial infection. So, the requirement of vaccine is crucial which can provide a better prevention or treatment against these diseases. But to minimize the cost of developing vaccine, different technologies have been identified that include computational immunology approach. Therefore, the present study employed the use of an immunoinformatic approach for scrutinization of the immunogenic epitopes from the most various pathogenic and widespread ETEC strain causing diarrhea globally. The decline in the traditional experimental vaccine development approach is due to high cost, time-consuming and practical limitations of feasibility had led to the use of computational approach which involves the epitopebased prediction methods in search of novel vaccine candidates from the whole proteomes (Khan et al. 2019) . This approach can be termed as "reverse vaccinology" which involves the characterization of vaccine candidates by screening complete proteome sequence analysis of the targeted pathogen. Here, in our study, only MHC-I T cell epitopes are calculated as accuracy of different tools for predicting MHC II restricted epitopes are much lower in comparison to MHC-I tools. And therefore selecting particular tool for accurate result for MHC II T cell epitopes has become concern (Zawawi et al. 2020) . To stimulate the different arms of the immune system, prediction of potential epitopes with the different computational tools was employed (Kumar et al. 2013) . The proteome of the most pathogenic strain E24377A can be helpful in identifying T cell epitopes in designing the vaccine candidates (Khan et al. 2018) . Furthermore, allergic and toxicity prediction, modeling of epitopes with HLA alleles can be studied and the docking of the MHC molecules and identified peptides will be performed. Here, in our analysis, complete proteome of the most widespread and pathogenic strain Escherichia coli E24377A has been selected. This whole proteome comprises 6 plasmids and a total of 4915 protein-coding sequences (Rasko et al. 2008) . The FASTA file of these protein sequences of the proteome id UP000001122 was isolated from Uniprot database (Morgat et al. 2019) . Uniprot database is a very informational database that stores protein sequence and functional information. Vaxign tool was applied to screen antigens that show adhesin value ≥ 0.51 (Xiang and He 2009 ). This tool not only determines the adhesion values of protein but also their localization, orthologs and transmembrane properties. By characterizing the adhesion probability of the antigenic protein, we will be identifying the binding ability of the pathogen to the host and thus targeting on the mode of action of bacteria. The flowchart to depict the complete methodology of the steps involved in the analysis of vaccine development is shown in Fig. 1 . Antigenicity of the filtered proteins was analyzed by Vaxi-Jen server. VaxiJen is the most reliable server used for calculation of antigens irrespective of sequence length and alignment in reference to three models, i.e. bacteria, virus and tumour (Doytchinova and Flower 2007) . These three models have shown remarkable stability and therefore the Fig. 1 Flowchart to represent the methodology used in the analysis for T cell epitope designing threshold value for the analysis was adjusted to 0.51 and the proteins having above this value were marked as antigenic proteins. Proteins that show VaxiJen scores above the cut-off value will only be selected for further study and low-score peptides will be eliminated. Allergen FP v.1.0 tool was used to differentiate between allergens and non-allergens proteins (Dimitrov et al. 2014 ). Cytotoxic T lymphocyte (CTL) epitopes play an essential role in rational vaccine design; therefore, prediction of CTL epitopes can minimize the experimental effort needed to identify epitopes. Identification of the T cell epitopes binding with higher affinity with HLA class I alleles was predicted using Net CTL 1.2 tool (Larsen et al. 2007 ). NetCTL is a web-based tool designed to predict human CTL epitopes in any given protein sequence. This tool is used for the predictions of proteasomal cleavage, TAP transport efficiency, and MHC class I affinity. This upgraded version of Net CTL 1.2 showed an improved performance and specificity as compared to the older version of Net CTL 1.0. To screen the best immunogenic epitopes with higher accuracy, we present the shortlisted peptides to VaxiJen v2.0 server. The peptides with high scores were selected for further analysis and those epitopes showing score ≥ 1.4 were selected for the toxicity prediction. Further, the toxicity of the peptides was calculated by Toxin Pred. This tool removes all the toxin peptides on the basis of toxin score and categorize only non-toxin peptides for analysis (Gupta et al. 2013 ). The three-dimensional mapping of the selected epitopes will be done with the help of PEPstrMOD tool (Kaur et al. 2007 ), a tool that well defines the 3D structures of the small peptide. The 3-D structures of the alleles HLA-A*11:01, HLA-B*15:02 and HLA-B*15:03 were modeled by Modeler 9.18 with their sequence downloaded from IPDIMGT/ HLA Database (Robinson et al. 2015) . Modeler relies on several parameters for loop modelling and predicts the quality of model using the DOPE score. This score is referred to as Z-score which differentiates between poor and native models, positive scores are considered to be poor-quality models but the scores less than − 1 are good-quality models. Modeler 9.18 tool generates five models, out of which the best model with was selected on the basis of DOPE score. Validation of the model was done using Ramachandran Plot analysis. To initiate an appropriate immune response, the interaction between antigenic epitope with the receptor is mandatory. So, the next step in the process is docking of the designed peptides with HLA alleles was performed and for this molecular docking, AutoDock 4.2 (Goodsell et al. 1998 ) was undertaken. AutoDock is an excellent userfriendly, non-commercial program that is widely being used by researchers and experts for vaccine construct (Morris et al. 2009 ). AutoDock works on the principle of Lamarckian genetic algorithm that combines with freeenergy force field which enables the stability of the ligand and macromolecule, thus initiates fast prediction of the bound conformations. It uses a grid-based method and autodock tools are embedded in an object-oriented programming language Python. This tool generates 10 trial conformations and the best output was finalized on the basis of higher binding energy score. The Chimera 1.2 tool was used for visualization of the docked complex structure orientations (Pettersen et al. 2004 ). This tool allows the user to visualize multiple sequence alignments, displays volumetric data and visualizes molecular dynamic trajectories of the docked models. Molecular dynamics (MD) study was performed to study the molecular interactions and RMSD values using MDWeb server. It enables a user-friendly setup to run simulations and works on different operations within guided interface which involves Amber, NAMD, and Gromacs full MD setup and analysis can be carried out using any standard trajectory format. It performs various functions like Trajectory manipulation, analysis per residue and flexibility analysis. The structure model, chain and residues locations have been checked in the initial step only as minor error can also lead to the unstable trajectories. To estimate the HLA allele distribution among the world population, an IEDB tool was used (Bui et al. 2006) . In this study, population coverage analysis of the potential epitopes with their corresponding MHC-I alleles was analyzed through IEDB tool. This tool generates the results in the form of histogram that shows the HLA gene frequencies on the specific populations. This is the best population coverage tool which is publicly accessible to all the users providing the most accurate result by covering 115 countries and help in the development of a T cell epitope-based vaccines. The genome of selected strain E24377A comprises 4.9 Mb and has 5305 coding sequences and 67 RNAs and the complete genome of this strain was sequenced (Ashok et al. 2015) . For the construction of vaccine candidates, a total number of 222 protein sequences were shortlisted from the complete proteome of E24377A strain through Vaxign on the basis of default adhesion cut-off value. In the next step, with the help of VaxiJen server, 95 proteins were selected and further presented to AllergenFP tool to distinguish between allergic and non-allergic proteins. Only proteins that were non-allergic in nature were shortlisted for further study and a total of 9 selected proteins are A7ZKX4, A7ZU80, A7ZJC8, A7ZK46, A7ZHN2, A7ZKR1, A7ZVJ1, A7ZRC0 and A7ZKE5 (Table 1) . To assess the amino acid sequences of the selected protein, these were subjected to NetCTL 1.2 for the prediction of cytotoxic T-cell epitope prediction. This method predicts MHC restricted epitopes on the basis of proteasomal C terminal cleavage and TAP transport efficiency and is restricted to 12 MHC class I supertype. The threshold value for epitope identification was 0.75 while all the other parameters were set to its default value. The results were sorted on the basis of combined score and epitopes that show good binding score were considered to be a good vaccine candidate (Table 2) . Previous literature work depicts the importance of MHC restricted T cell epitopes in the design of vaccine and how it can reduce the time and effort through computational algorithm (Singh et al. 2015) . In addition to selecting the most potential immunogenic epitope, the epitopes selected from NetCTL 1.2 were further presented to VaxiJen tool to predict the peptides showing high scores ≥ 1.75 were selected. Peptides binding to MHC class I alleles (IC50) ≤ 500 nM were shortlisted for the further analysis and therefore we found four peptides with highest scores. LPYFELRWL To evaluate the potential predicted epitopes as a good immunogen for vaccine candidates, rigorous methods have applied to view their antigenic nature. These peptides were characterized to be non-toxic in nature as predicted by ToxinPred tool (Table 3) . All the parameters are kept in mind while selecting the proteins for potential candidates and therefore proteins that show more than two transmembrane regions are not considered as a good antigen for vaccine due to the difficulty in cloning, expressing and purifying. Hence, we applied TMHMM method which classified all the selected antigens having less than 1 transmembrane region as it is shown in Table 3 . The mechanism of ETEC infection is primarily dependent on its binding ability to the membrane and therefore, all the 9 antigens were declared as adhesions by SPAAN program (Sachdeva et al. 2005) . To check the adhesion capability of pathogens to host through experimental method is a rigorous and time-consuming process, so computational algorithm SPAAN has shown specificity of 100% to determine adhesions. Modeling of the three-dimensional structure of the MHC-I HLA alleles-HLA-A*1101 and HLA-B*1502 corresponding to the highest scoring epitopes was generated using Modeler 9.18 (Table 4 ). The three-dimensional structural knowledge of proteins plays an essential role in depicting the complete information of their molecular functions as well as identification of their binding sites (Sali et al. 1995) . According to the DOPE scores of the selected epitopes and alleles' model, the highest negative value model was selected. To further validate the overall quality of modeled structure of these HLA alleles, it was subjected to RAMPAGE server which calculates the four-modeled structure's residues are > 90% in favored region and approved the quality of the models. In 2014, this paper stated the fact that epitopes are enough to trigger the strong immune response in comparison to the whole protein sequences (Huber et al. 2014) . Therefore, distribution of HLA alleles on a specific set of population is very important to understand the potential of vaccine worldwide. The docking of the potential three epitopes with their corresponding HLA allele was performed using Autodock 4.2. Different conformations were generated and the best conformation was selected on the basis of binding energy score, lower the binding affinity, the stronger is the interaction between HLA allele and epitope. The interactions between peptide QYGGGNSAL corresponding to HLA-A*11:01 and HLA-B*15:02, peptide LPYFELRWL corresponding to HLA-A*11:01and HLA-B*15:02 and peptide CVILFF-SIL corresponding to HLA-A*11:01 and HLA-B*15:02 were considered to be the most effective T cell epitopes with highest binding energy of − 2.09 kcal/mol, − 1.84 kcal/ mol and − 1.30 kcal/mol, respectively (Table 5 ). Out of the three epitopes, two epitopes with best binding energies were selected and therefore QYGGGNSAL and LPYFELRWL were analyzed by Python Molecular viewer as shown in the Figs. 2 and 3 . The ligand-enzyme complex is stabilized mainly by hydrogen bonds and hydrophobic interactions. The binding affinity of the ligand and protein molecule can also be affected if the bulk water around it has more strong bonds. Therefore, strong hydrogen bond between docked complexes can lead to a better binding energies with better stability in simulation also. The RMSD values were calculated to check the stability of the selected epitopes interacting with their corresponding alleles. For preparing the complex for simulation, there are several steps that is required to get better result. In the first step, it removes crystallographic water molecules and adds hydrogen atoms and missing side chains. In the further steps, it restrains the heavy atoms to their position with a pressure of 500 KJ/mol*nm 2 and then a truncated box of TIP3P water molecules (Amber Force) or other Forces at a distance of 15 Å around the molecules. To validate the stability of the vaccine construct over a period of time, simulation analysis Population coverage analyses were predicted through IEDB tool, these two epitopes QYGGGNSAL and LPYFELRWL were found to be most immunogenic with the highest population coverage > 50% in the regions of East Asia, India, Northeast Asia and South Asia. Figures 6 and 7 indicate the maximum coverage was observed to be 71.44% in Northeast Asia and 50.99% in India for epitope QYGGGNSAL and 66.56% in Northeast Asia and 50.71% in India for epitope LPYFEL-RWL. The results from population coverage analysis indicate that epitopes cover different populations effectively in all the countries but are mainly high population coverage in the places highlighted in both the figures. Thus, the potential vaccine candidates are covering maximum number of population worldwide and can benefit large masses of ethnic groups. We have predicted two epitopes QYGGGNSAL and LPYFEL-RWL for designing vaccine against diarrhea. These epitopes were selected on the basis of different parameters like antigenicity, docking score, Vaxijen scores, population coverage data and stability over a period of time. Apart from several advantages of Immunoinformatics methods are valuable in terms of reducing time and cost in vaccine design, there are few limitations as well which cannot be neglected. First, variation in results may occur in different softwares, and second, multi-epitopes vaccine can offer better immunity than singlebased approach. Therefore, these epitopes identified in our analysis could be further tested in experimental laboratory for its successful outcome as a vaccine that could provide protection worldwide. The online version contains supplementary material available at https ://doi.org/10.1007/s1372 1-021-00287 -6. Predicting population coverage of T-cell epitope-based diagnostics and vaccines Interrogation of a live-attenuated enterotoxigenicEscherichia coli vaccine highlights features unique to wild-type infection AllergenFP: allergenicity prediction by descriptor fingerprints VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines Automated docking using a lamarckian genetic algorithm and empirical binding free energy function Open Source Drug Discovery Consortium (2013) In silico approach for predicting toxicity of peptides and proteins Correlates of protection for enteric vaccines Significance of enterotoxigenic Escherichia coli (ETEC) heat-labile toxin (LT) enzymatic subunit epitopes in LT enterotoxicity and immunogenicity T cell responses to viral infections-opportunities for Peptide vaccination systematic review of ETEC epidemiology focusing on coloniz r and toxin expression PEPstr: a de novo method for tertiary structure prediction of small bioactive peptides Epitope based peptide prediction from proteome of enterotoxigenicE. coli Computational identifcation and characterization of potential T-cell epitope for the utility of vaccine design against enterotoxigenic Escherichia coli Screening and structurebased modeling of T-cell epitopes of Marburg virus NP, GP and VP40: an immunoinformatic approach for designing peptidebased vaccine Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction Identification of epitope-based peptide vaccine candidates against enterotoxigenic Escherichia coli: a comparative genomics and immunoinformatics approach Enzyme annotation in UniProtKB using Rhea Autodock4 and AutoDockTools4: automated docking with selective receptor flexiblity UCSF Chimera-a visualization system for exploratory research and analysis Enterotoxigenic Escherichia coli in developing countries: epidemiology, microbiology, clinical features, treatment, and prevention Thepangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates The IPD and IMGT/HLA database: allele variant databases SPAAN: a software program for prediction of adhesins and adhesin-like proteins using neural networks Evaluation of comparative protein modeling by MODELLER Antibodies induced by enterotoxigenicEscherichia coli (ETEC) adhesin major structural subunit and minor tip adhesin subunit equivalently inhibit bacteria adherence in vitro Phenotypic diversity of enterotoxigenicEscherichia coli (ETEC) isolated from cases of travelers' diarrhoea in Kenya A Japanese encephalitis vaccine from India induces durable and cross-protective immunity against temporally and spatially wide-ranging global field strains Vaxign: a web-based vaccine target design program for reverse vaccinology In silico design of a T-cell epitope vaccine candidate for parasitic helminth infection Current progress in developing subunit vaccines against enterotoxigenicEscherichia coli-associated diarrhea The authors gratefully acknowledge the necessary computational facilities and sound supervision provided through the research work by the Department of Biotechnology, Faculty of Engineering & Technology, Rama University, Kanpur, U.P., India for their generous support during the research work. Conflict of interest The authors declare that they have no conflict of interest.