key: cord-0002427-d44eqqah authors: Saffari, Babak; Mohabatkar, Hassan title: Computational Analysis of Cysteine Proteases (Clan CA, Family C1) of Leishmania major to Find Potential Epitopic Regions date: 2009-11-25 journal: Genomics Proteomics Bioinformatics DOI: 10.1016/s1672-0229(08)60037-6 sha: 06c4f9daabdf5ee16fb4b581a954e3909d9f17a0 doc_id: 2427 cord_uid: d44eqqah Leishmania is associated with a broad spectrum of diseases, ranging from simple cutaneous to invasive visceral leishmaniasis. Here, the sequences of ten cysteine proteases of types A, B and C of Leishmania major were obtained from GeneDB database. Prediction of MHC class I epitopes of these cysteine proteases was performed by NetCTL program version 1.2. In addition, by using BcePred server, different structural properties of the proteins were predicted to find out their potential B cell epitopes. According to this computational analysis, nine regions were predicted as B cell epitopes. The results provide useful information for designing peptide-based vaccines. Leishmania (Order: Kinetoplastida, Family: Trypanasomatidae) is an obligate intracellular parasite responsible for a broad spectrum of diseases, ranging from simple cutaneous to invasive visceral leishmaniasis (1 ) . Protozoan parasites of the genus Leishmania present two forms in their life cycle: promastigote, which multiplies in the mid gut of the sand fly vector, and amastigote, the obligate intracellular form that lives within phagolysosomes of the vertebrate host (2 , 3 ) . Three major types of leishmaniasis, namely cutaneous, mucocutaneous and visceral, occur in humans depending on the species of Leishmania. Infection by species such as L. major, L. tropica and L. mexicana may cause localized cutaneous lesions, resulting in lifelong immunity. Infection by L. braziliensis and L. panamensis initially presents as cutaneous lesions that may then spread or metastasize causing mucocutaneous lesions. Infection by L. donovani, L. infantum and L. chagasi may result in a chronic disseminating visceral disease in the liver and spleen (4 ) . It has been recognized for many years that proteases of pathogenic organisms may modulate the host's defense mechanisms (5 ) . Proteases are grouped into clans and families on the basis of the architecture of their catalytic dyad or triad (6 ) . Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan. L. major has cysteine proteases (CPs) of eight families within clan CA. Family C1 contains CPA and CPB, which are both cathepsin L-like in terms of primary amino acid sequence, and CPC, which is cathepsin Blike. CPB is unusual in that it has a 100-amino acid C-terminal extension in comparison with most CPs of the group, and exists as multiple isoenzymes, which are encoded by a tandem array of similar CPB genes located in a single locus (the arrays comprise eight genes in L. major ). The CPBs of L. mexicana are stage-regulated and the isoforms present differences in their substrate specificity and catalytic properties (7 ) . Although the exact roles of CPs in Leishmania pathogenesis are unclear, it has been demonstrated that Leishmania cannot grow within macrophages in the presence of CP inhibitors (8 ) . These observations provide evidence of the importance of these molecules in the survival of both promastigote and amastigote forms of these parasites (9 ) . Despite trypanosomatid CPs may be instrumental in modulating the host's immune response to favor parasite survival and proliferation, they are themselves immunogenic. L. mexicana CP is a T cell immunogen, resulting in the development of potentially protective Thl cell lines (10 ) . This finding suggests that the CP itself is a vaccine candidate and that homologous enzymes in other species may also be so. A CP of L. pifanoi, however, provided rather little protection for the host against infection with the parasite (11 ), although more recently a similar L. amazonensis CP provided some protection against subsequent challenge, apparently through inducing a Thl-associated response (12 ) . These differing results presumably reflect the complexity of the immune response to the active parasite enzymes and how the response may be determined by the precise immunization conditions. It is therefore encouraging that a CP-rich fraction of L. major was shown to be a strong inducer of a primed human immune response and may have protective function (13 ) . These observations suggest that trypanosomatid CPs have potential as vaccines although attempts to exploit them are really just beginning (14 , 15 ) . In addition, it has also been proven that infected dendritic cells are the critical antigenpresenting cells responsible for T cell priming in Leishmania infections. Amastigotes, but not the infectious promastigotes, are the main targets for phagocytosis activity of dendritic cells (16 , 17 ) . A key step in the design of subunit vaccines is the identification of epitopes from overlapping synthetic peptides. This method decreases the possibility of missed epitopes, but lots of peptides need to be synthesized at a high cost. New developments in immunoinformatics and other computational methodologies, combined with the broad versatility in the design and synthesis of genetic (DNA) vaccines, underlay new strategies for the novel design of antigen-specific, epitope-based vaccines against many pathogens that currently have proven refractive to conventional vaccine therapy (18 ) . Epitopes are selected by prediction with software, which saves the expense of synthetic peptides and working time (19 , 20 ) . Basically, the recognition of antigenic epitopes by the immune system, either small discrete T cell epitopes or large conformational epitopes recognized by B cells and soluble antibodies, is the key molecular event at the heart of the immune response to pathogens (21 ) . The objective of this bioinformaticsbased study is to enhance the optimal selection of epitopic regions of clan CA, C1 family of cysteine proteases as potential targets of immune response. Consensus sequence methodology was used to identify sequences of 9 amino acids or longer with complete conservation in 80% or more of C1 families of cysteine proteases. These conserved sequences were further analyzed to identify targets for candidate epitopebased T cell vaccine formulations against L. major. Furthermore, concerning the activation of human humoral immunity by leishmaniasis, B cell epitopes were also predicted based on propensity scales for each of the 20 amino acids. Utilizing bioinformatics servers for vaccine candidates is a time-saving approach that could significantly help to increase our information about various aspects of pathogens in molecular biology. These theoretical predictions can then be tested by using experimental and complementary methods. For the prediction of major histocompatibility complex (MHC) class I binding peptides of C1 family of cysteine proteases, one sequence for each of the cysteine proteases A, B and C was selected. In the case of CPB, due to the high similarity between these cysteine proteases, only one out of eight sequences was chosen (LmjF08.1050). This sequence is a good candidate of CPB because of the complete identity of its sequence with the consensus sequence of CPB. All overlapping nonamer peptides were generated from this dataset and were screened for potential T cell antigens using the NetCTL algorithm, from which 716 peptides were short-listed. Most of the peptides were found to exhibit mono-supertype specificity, meaning that they bind to a single supertype. Some of them, however, appeared to bind to multiple supertypes; the highest number of supertypes a given nonamer could bind is 5 out of the 12 supertypes tested. In fact, out of the 716 human leukocyte antigen (HLA)-binding nonamers, one peptide binds to 5 supertypes, 4 bind to 4 supertypes, 9 bind to 3 supertypes and 44 bind to 2 supertypes. Sequences of binding peptides to the 4 and 5 supertypes and the name of proteins they belong to are summarized in Table 1 . Knowing the number of binding peptides of each of the analyzed proteins is important, considering the polymorphic nature of HLA and its diversity in populations of different geographical regions. Therefore, a good T cell antigen should have peptides recognized by many HLA alleles. The analysis revealed that CPB has the maximum number of binding peptides, followed by CPC and CPA, respectively ( Table 2) . For short-listing potential vaccine candidates, it is important to analyze the binding profiles from a supertype perspective. Of the 12 supertypes studied, the largest number of nonamers was found to be recognized by the allele B62 (53), followed by B58 (38) , A2 (36), A24 (24), A1, B8 and B39 (21), B44 (20), A3 and B7 (19) , B27 (17) and A26 (15) , as illustrated in Figure 1 . The score with which a peptide binds to HLA ranges from 0.75 to 3.1949 . In general, the binding score of CPC peptides to A1 locus supertypes is higher compared with CPB, CPA and other supertypes ( Table 3) . Putative promiscuous T cell epitopes may be localized in clusters, as reported in studies of HIV-1 (22) (23) (24) (25) , the outer membrane of Chlamydia trachomatis (26 ), and among others (27 , 28 ) . The clusters are also ideal for developing epitope-based vaccines because they contain multiple promiscuous epitopes. The number of immunogenic hotspots for CPA, CPB and CPC is 4, 5 and 0, respectively, as shown in Table 4 . The identification of conserved sequences is very important to design peptide vaccines, because vaccines that are developed on the basis of the conserved segments among candidate proteins can be used against a large majority of pathogen's variants. In Figure 2 , three segments (I, II and III) with identity ≥90% and have ≥9 amino acids in length are shown as conserved regions. Obviously the epitopes predicted in these regions are very significant. For immunological applications, a minimum conserved sequence length of 9 amino acids is required because this represents the typical length of peptides that bind to HLA molecules (29 ) . The features of potential epitopes located in conserved regions with maximum scores are summarized in Table 5 . Before the prediction of B cell epitope of CPB (LmjF08.1050), signal peptide of this protein predicted by SignalP 3.0 hidden Markov model (HMM) (signal peptide probability 0.999, signal anchor probability 0.001, with cleavage site probability 0.760 between residues 27 and 28) was excluded. Hydrophilicity, flexibility, accessibility, turns, exposed surface, polarity and antigenic propensity scales were applied to predict B cell epitopes. These parameters were correlated with the location of continuous epitopes. As a result, 9 regions were predicted to be B cell epitopes (Figure 2 ). The aim of this investigation was to apply bioinformatics methods to study the B and T cell epitopic sites of C1 family cysteine proteases of L. major. To help the development of vaccines, understanding the structural basis for the cell-mediated immune response is necessary (30 ) . The perfect bioinformatics prediction of T cell epitopes can to a great extent reduce the experimental cost in candidate epitope identification (31 ) . In the present study, NetCTL program has been used to predict MHC class I of cysteine proteases A, B and C of L. major (32 ) . CP proteins are immunogenic and are potential vaccine candidates. Efficient processing and presentation of vaccine antigens by class I and/or class II MHC are essential for a good T cell response. Since humans carry only a limited number of co-dominant HLA alleles in their genome (2 each for A, B and C loci), out of hundreds of polymorphic alleles present in the population, it becomes important that a candidate vaccine must generate peptides that bind to a wide range of HLA molecules to provide good population coverage. In this work we found that generated peptides bind in larger numbers to B supertypes. However, almost all of the peptides with the highest binding score belong to CPC. In other words, CPC is the major source of peptides that bind to HLA loci with more affinity. These observations suggest that greater emphasis must be placed on cytotoxic T lymphocyte (CTL) response generated by the presentation of antigen by B alleles and should design epitope-based vaccine directed towards these HLA. T cell epitopes specific to multiple HLA supertypes are advantageous for vaccine design because they effectively increase the number of epitopes to which an individual can respond, and provide much more extensive coverage of the population (33 ) . The peptides binding to more than one HLA are termed promiscuous and such peptides are of prime interest for vaccine design because of their relevance in coverage of higher proportions of human populations. In silico approaches would help to predict some of the HLA-binding motifs, which could act as promiscuous epitopes (34 ) . Most of the generated nonameric peptides in this work are mono-allelic binders. To cover majority of the population, it is essential to have vaccine candidates that have multi-binding behavior. Consequently, peptides with the binding ability of ≥4 supertypes were taken as promiscuous epitopes. Note that each supertype consists of multiple HLA alleles, and peptides that can bind to ≥4 supertypes have a great potentiality to activate the most proportion of T cell population. It is generally recognized that conserved protein sequences represent important functional domains (35 ) , for which mutations would be detrimental to the survival of the pathogen. The functions of conserved sequences can be elucidated by databases that comprise data on protein families, domains and functional sites, such as the Pfam database (www.sanger.ac.uk/Software/Pfam) (36 ) . In Figure 2 , in addition to the ClustalW consensus sequence, the results of the Pfam database and the highly conserved regions that have ≥9 amino acids in length have also been shown. It is clear that the predicted epitopes located in the conserved segments have more validity. Eventually, identification of proteins with peptides binding to larger number of alleles, assessment of alleles or supertypes of MHC that bind large number of peptides than others have great importance in determining epitopes as a candidate vaccine. In addition, allelic variation in binding affinity, immunologi-cal hotspots, HLA distribution analysis and similarity of epitopes to the self proteins play a key role in identification of these epitopes (34) (35) (36) . In proteins, turns are located on the surface; these parts are accessible and hydrophilic but the core regions are mostly devoid of water molecules (37 ) . Antigenic determinants lie in regions that are hydrophilic, exposed and polar, while accessibility and flexibility of these segments are high. This has led to the rules that would allow the position of B cell epitopes to be predicted from these features of the protein sequence (37 , 38 ) . In conclusion, recognizing epitopes on proteins is essential for developing synthetic vaccines and can facilitate immunotherapy of leishmaniasis and many other infectious diseases. In the present work, employing a bioinformatics approach, a set of peptides has been identified, which can be used in either a natural or a synthetic vaccine cocktail. This approach could be extended to the entire proteome of L. major to identify newer sets of potentially antigenic proteins and yet reducing the number of T and B cell antigens for experimental verification. These kinds of researches can be applied for omitting non-functional sequences of proteins, which would help in designing new immunological methods. The sequences of ten cysteine proteases (clan CA, family C1) of L. major were obtained from GeneDB database (www.genedb.org) (39 ) . Sequences included one CPA (Systematic ID: LmjF19.1420), one CPC (LmjF29.0820) and eight CPBs (LmjF08.1010, LmjF08.1020, LmjF08.1030, LmjF08.1040, LmjF08. 1050, LmjF08.1060, LmjF08.1070 and LmjF08.1080). NetCTL program version 1.2 (http://www.cbs.dtu.dk/ services/NetCTL/) (40 ) predicts peptides restricted to 12 HLA class I supertypes (A1, A2, A3, A24, A26, B7, B8, B27, B39, B44, B58 and B62), integrated with predictions of HLA binding, proteasomal C-terminal cleavage and transport efficiency by the transporter associated with antigen processing (TAP) molecules. HLA binding and proteasomal cleavage predictions were performed by an artificial neural networks (ANN) method and TAP transport efficiency was predicted using a weight matrix method. The parameters used for NetCTL prediction were: 0.15 weight on C terminal cleavage, 0.05 weight on TAP transport efficiency, and 0.75 threshold for HLA supertype binding. The final scores are the predicted MHC class I affinities in form of −logIC50 and IC50 values. All prediction calculations were based on propensity scales for each of the 20 amino acids. Sequence of each protein was read as a moving window. In order to compare the profiles obtained by different methods, various scales were normalized where the original values of each scale were set between +3 and −3. Hydrophilicity (41 ), flexibility (42 ) , accessibility (43 ) , turns (44 ), exposed surface (45 ), polarity (46 ) and antigenic propensity (47 ) scales were applied to predict B cell epitopes by BcePred server (http://www.imtech.res.in/raghava/bcepred) with default threshold. Due to the elimination of signal peptides of CPBs before secretion to the outer space of the infected cells, this region must be excluded from the entire sequence of the protein for exerting the prediction analysis on it. Signal peptide prediction was achieved using SignalP 3.0 HMM (http://www.cbs.dtu.dk/ services/SignalP-3.0/) (48 ) . Putative promiscuous T cell epitopes may be localized in clusters that are also ideal for developing epitopebased vaccines because they contain multiple promiscuous epitopes. For determining the immunogenic hotspots, MULTIPRED server (http://antigen.i2r.astar.edu.sg/multipred/) (49 ) was utilized. Sequences were aligned using ClustalW program (50 ) from the BioEdit v5.0.9 package (51 ). The developmental biology of Leishmania promastigotes Molecular determinates of Leishmania virulance Human leishmaniasis: clinical, diagnostic, and chemotherapeutic developments in the last 10 years Leishmaniasis. Public health aspects and control The stage-regulated expression of Leishmania mexicana CPB cysteine proteases is mediated by an intercistronic sequence element Evolutionary lines of cysteine peptidases Are bacterial proteases pathogenic factors? Evidence from disruption of the lmcpb gene array of Leishmania mexicana that cysteine proteases are virulence factors Leishmania mexicana cysteine protease-deficient mutants have attenuated virulence for mice and potentiate a Th1 response Antigen presentation by Leishmania mexicana-infected macrophages: activation of helper T cells specific for amastigote cysteine proteases requires intracellular killing of the parasites Leishmania pifanoi amastigote antigens protect mice against cutaneous leishmaniasis Characterization of an antigen from Leishmania amazonensis amastigotes able to elicit protective responses in a murine model Biochemical analysis and immunogenicity of Leishmania major amastigote fractions in cutaneous leishmaniasis Leishmania vaccines: old and new Leishmania amastigote target antigens: the challenge of a stealthy intracellular parasite Uptake of Leishmania major amastigotes results in activation and interleukin 12 release from murine skin-derived dendritic cells: implications for the initiation of anti-Leishmania immunity Leishmania majorinfected murine Langerhans cell-like dendritic cells from susceptible mice release IL-12 after infection and vaccinate against experimental cutaneous leishmaniasis A systematic bioinformatics approach for selection of epitope-based vaccine targets The use of bioinformatics for identifying class II-restricted T-cell epitopes Identification of immunodominant Th1-type T cell epitopes from Schistosoma japonicum 28 kDa glutathione-S-transferase, a vaccine candidate Quantitative approaches to computational vaccinology. Immunol Three regions of HIV-1 gp160 contain clusters of immunodominant CTL epitopes Localization of CD4+ T cell epitope hotspots to exposed strands of HIV envelope glycoprotein suggests structural influences on antigen processing Clustering of Th cell epitopes on exposed regions of HIV envelope despite defects in antibody activity Construction of peptides encompassing multideterminant clusters of human immunodeficiency virus envelope to induce in vitro T cell responses in mice and humans of multiple MHC types Epitope clusters in the major outer membrane protein of Chlamydia trachomatis SARS coronavirus nucleocapsid immunodominant T-cell epitope cluster is common to both exogenous recombinant and endogenous DNAencoded immunogens Prediction of class I Tcell epitopes: evidence of presence of immunological hot spots inside antigens Chemistry of peptides associated with MHC class I and class II molecules Vaccines and cell-mediated immunity Prediction of epitopes and structural properties of Iranian HPV-16 E6 by bioinformatics methods. Asian Pac PS) 2 : protein structure prediction server Development of a DNA vaccine designed to induce cytotoxic T lymphocyte responses to multiple conserved epitopes in HIV-1 A combined immuno-informatics and structure-based modeling approach for prediction of T cell epitopes of secretory proteins of Mycobacterium tuberculosis Scoring residue conservation The Pfam protein families database Predicting location of continuous epitopes in proteins from their primary structures Prediction of exposed domains of envelope glycoprotein in Indian HIV-1 isolates and experimental confirmation of their immunogenicity in humans. Braz GeneDB: a resource for prokaryotic and eukaryotic organisms An integrative approach to CTL epitope prediction: a combined algorithm integrating MHC class I binding, TAP transport efficiency, and proteasomal cleavage predictions New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites Prediction of chain flexibility in proteins: a tool for the selection of peptide antigens Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide Correlation between the location of antigenic sites and the prediction of turns in proteins A semiempirical method for prediction of antigenic determinants on protein antigens Conformation of amino acid side-chains in proteins Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins Improved prediction of signal peptides: SignalP 3.0 MULTIPRED: a computational system for prediction of promiscuous HLA binding peptides CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT This work was supported by a grant (No. 88-GR-SC-47) from Shiraz University. BS conceived the study and carried out the computational analysis. HM supervised the study. BS and HM prepared the manuscript. Both authors read and approved the final manuscript. The authors have declared that no competing interests exist.