key: cord-007719-3ypv9k9p authors: Wang, Mingjun; Claesson, Mogens H. title: Classification of Human Leukocyte Antigen (HLA) Supertypes date: 2014-05-06 journal: Immunoinformatics DOI: 10.1007/978-1-4939-1115-8_17 sha: doc_id: 7719 cord_uid: 3ypv9k9p Identification of new antigenic peptides, derived from infectious agents or cancer cells, which bind to human leukocyte antigen (HLA) class I and II molecules, is of importance for the development of new effective vaccines capable of activating the cellular arm of the immune response. However, the barrier to the development of peptide-based vaccines with maximum population coverage is that the restricting HLA genes are extremely polymorphic resulting in a vast diversity of peptide-binding HLA specificities and a low population coverage for any given peptide–HLA specificity. One way to reduce this complexity is to group thousands of different HLA molecules into several so-called HLA supertypes: a classification that refers to a group of HLA alleles with largely overlapping peptide binding specificities. In this chapter, we focus on the state-of-the-art classification of HLA supertypes including HLA-I supertypes and HLA-II supertypes and their application in development of peptide-based vaccines. The immune system, including the innate and adaptive as well as overlapping systems, plays a pivotal role in the defense against viral or bacterial infections, immune homeostasis, and cancer surveillance. Within the immune system, T lymphocytes are crucial for adaptive immune responses, and are activated upon recognition of peptides displayed by human leukocyte antigen class I (HLA-I) or class-II (HLA-II) molecules at the surfaces of antigen-presenting cells (APCs). T lymphocytes express the T cell receptor (TCR) that recognizes specifi c peptides, which have been processed and presented in combination with an HLA molecule. There are two major subtypes of T lymphocytes: CD8 + cytotoxic T cells (CTLs) and CD4 + helper T cells. CTLs recognize peptides in the context of HLA-I molecules, while CD4 + helper T cells recognize peptides associated with HLA-II molecules. The functional activity of these two subsets of T cells is said to be restricted by HLA-I and -II molecules, respectively. It is known that CTLs play a major role in killing tumor cells [ 1 , 2 ] and controlling viral or bacterial infections [ 3 -7 ] , while CD4 + T cells are required for priming and expansion of naive CD8 + T cells as well as secondary expansion of CD8 + memory T cells [ 8 -12 ] . It might therefore be of critical importance to incorporate both HLA-I-and -II-restricted epitopes in peptide-based vaccines to obtain participation of both CD4 + and CD8 + T cells for generation of strong and long-lasting immunity. Thus, identifi cation of new antigenic peptides, derived from infectious agents or tumor antigens, which may bind to HLA-I or HLA-II molecules in exchange with self-peptides normally occupying the HLA-binding site ( see below), is important for developing new effective vaccines capable of activating the cellular arm of the immune responses. However, the barrier to development of peptide-based vaccines with maximum population coverage is that the restricting HLA genes are extremely polymorphic resulting in a vast diversity of peptide-binding HLA specifi cities and a low population coverage for any given peptide-HLA specifi city. As of April 2013, it has been reported that there are 7,089 HLA-I alleles and 2,065 HLA-II alleles ( http://hla.alleles.org ). Undoubtedly, these numbers will be further increased in the future. To reduce this complexity, one option is to group thousands of different HLA molecules into clusters of several so-called HLA supertypes: a classifi cation that refers to a group of HLA alleles with largely overlapping peptide binding specifi cities. In this chapter, we discuss the state-of-the-art classifi cation of HLA-I and HLA-II supertypes and their application in development of peptide-based vaccines. The major histocompatibility complex class I (MHC-I) antigens are referred to as the human leukocyte antigens class I (HLA-A, -B, and -C) and as H-2 class I antigens (K, D, and L) in mice. HLA-I antigens consist of three non-covalently associated components: a 45 kDa glycosylated amino acid (AA) heavy chain (HC), a 12 kDa light chain (beta 2 microglobulin, β2m), and a short 8-10 AA self-peptide. The heavy chain of HLA-I consists of about 340 AA residues, including a cytoplasmic region (about 30 AA residues), a transmembrane region (about 40 AA residues), and an extracellular region composed of three immunoglobulin-like domains (α1, α2, and α3), each consisting of approximately 90 AA. The α1 and α2 domains form a peptide-binding groove and contain the positions contributing to the binding pockets for the peptide and T cell receptors. The binding groove is divided into six distinct pockets (A-F) based on chemical and physical characteristics; the most important pockets for peptide binding are the B and the F pockets. The membrane-proximal α3 domain of the HC contains a binding site for the co-stimulatory molecule CD8 [ 13 ] expressed by CTLs, which play an important enhancing role in killing virus-infected cells and cancer cells. The α1 and α2 domains consist of two segmented alpha helices forming the walls and eight antiparallel β strands forming the fl oor-together forming a unique peptide-binding groove, which is the site where the self (or foreign antigen-derived) peptide (8-10 AA) binds to the polymorphic parts of the HC and is presented to peptide-specifi c CTL for scrutiny. β2m is non-covalently associated with the extracellular region of the HLA-I heavy chain by non-covalent interactions with α2 and α3 domains [ 14 ] . β2m is essential for the correct conformation of the peptide-binding groove of the heavy chain and stabilizes the HLA-I antigen peptide complex on the cell surface. Thus, β2m indirectly participates in the antigen presentation to specifi c T-cell receptors of CTL [ 15 -17 ] . The assembly of HLA-I peptide complex occurs in the endoplasmic reticulum (ER). Initially, the HLA-I HC associates with the chaperone calnexin (CNX) initiating an early folding and a disulfi de bond formation within the HC. The newly synthesized HLA-I HC then associates with β2m to form heterodimer. This heterodimer is rapidly recruited into the peptide-loading complex (PLC) consisting of a transporter associated with antigen processing (TAP), and the chaperones tapasin, calreticulin (CRT), and ERp57. The HLA-I HC/β2m heterodimer is now ready for peptide loading. Peptides, both self-and pathogen-derived, are predominantly generated in the cytosol by the proteasome to degrade cytosolic proteins into short peptides, although a proteasomeindependent peptide produced directly by insulin-degrading enzyme has been recently documented [ 18 ] . Thereafter, the peptides are transported into the ER by the TAP1 and TAP2. These peptides are further trimmed by aminopeptidase ERAAP1 and ERAAP2 to 8-10 AA, a length appropriate for HLA-I binding. Once HLA-I/HC-β2m dimers, physically associated with PLC, bind a subset of high-affi nity peptides, the fully assembled MHC-I peptide complexes are released from PLC and transported via the Golgi apparatus to the cell surface, where the peptides are presented by HLA-I to CTL for scrutiny ( see details in reviews [ 19 , 20 ] ). The HLA-II molecule consists of two chains: α and β chain (each one with two domains: α1 and α2, β1 and β2) and a self-peptide with 13-25 AA located in a cleft formed by the α1 and β1 domains. Classical HLA-II molecules include HLA-DR, HLA-DQ, and HLA-DP and are expressed mostly in the membrane of the professional antigen-presenting cells, where they present processed extracellular antigenic peptides to CD4 + T cells. In contrast to the antigen-binding groove of HLA-I molecule, which is closed at each end, the antigen-binding groove of HLA-II molecules is open at both ends and allows longer peptides (13-25 AA) to be loaded [ 21 , 22 ] . During synthesis of HLA-II molecules in the ER, the α and β chains are produced and associate with an invariant chain, which stabilizes the HLA-II molecule and prevents it from binding of intracellular peptides or peptides from the endogenous pathway. The invariant chain directs transportation of HLA-II from the ER to the Golgi complex, followed by fusion with late endosomes which contain peptides derived from endocytosed, degraded proteins (self or foreign). The invariant chain is then cleaved by cathepsins to form a small fragment known as CLIP, which occupies the peptide-binding groove of the HLA-II molecules. HLA-DM facilitates CLIP removal and makes the peptide-binding groove of HLA-II ready for peptide loading before the HLA-II-peptide complex migrates to the cell surfaces to be scrutinized by CD4 + T cells [ 23 ] . The concept of supertypes was fi rstly introduced by Alessandro Sette's group in 1995 [ 24 , 25 ] . The defi nition of an HLA supertype is that HLA molecules with similar peptide binding features are grouped into one supertype; this means that if a peptide is able to bind to one allele within a supertype, it can also bind to all other alleles in this supertype. In practice, actually only a few peptides that are able to bind to one allele in a supertype can bind to all the other alleles within the supertype. To date, many methods have been used to defi ne HLA-I supertypes, including structural similarities, shared peptide-binding motifs, and identifi cation of crossreacting peptides [ 26 -29 ] . Based on motifs derived from binding data or sequencing of endogenously bound peptides, along with simple structural analyses, Sette and Sidney [ 30 ] defi ned nine supertypes (HLA-A1, -A2, -A3, -A24, -B7, -B27, -B44, -B58, -B62), which were reported to cover most of the HLA-A and -B polymorphisms. Subsequently, Ole Lund's group [ 26 ] constructed hidden Markov models (HMMs) [ 31 ] for HLA-I molecules using a Gibbs sampling procedure [ 32 ] and defi ned a similarity measure between these sequence motifs. By using this similarity to cluster alleles into supertypes, Ole Lund's group [ 26 ] further defi ned three new HLA-I supertypes (HLA-A26, -B8, and -B39), in addition to the nine supertypes described previously by Alessandro Sette's group [ 30 ] , which was based on about 100 HLA-I peptide interactions. In the past few years, a lot of binding data have been generated; MHC-binding motif information is readily accessible ( http://www.iedb.org ), and MHC sequence data are also available in the IMGT (the international ImMunoGeneTics information system: http://www.imgt.org ) database. In 2008 Alessandro Sette's group analyzed the updated list of alleles available through IMGT using a simple approach largely based on compilation of published motifs, binding data, and analyses of shared repertoires of binding peptides, in combination with clustering based on the primary sequence of the B and F peptide-binding pockets [ 29 ] . They provided updated supertype assignments, with new assignments for about 1,000 different HLA-I alleles, which is about a tenfold increase in the number of alleles compared to their original classifi cation done in 1999 [ 30 ] . In the updated HLA-I classifi cation, Alessandro Sette's group found that about 80 % of the 945 alleles examined were classifi ed into one of the nine supertypes identifi ed previously [ 30 ], and they did not suggest the existence of any other novel supertypes. However, they found that some alleles have specifi cities spanning two different supertypes, nine alleles share features of both the A01 and A03 supertypes, and another ten alleles have a specifi city overlapping the A01 and A24 supertypes [ 29 ] . In addition, some alleles could not be assigned to any supertypes known today on the basis of the criteria mentioned above; thus these unclassifi ed alleles remain to be addressed. In summary, the updated HLA-I classifi cation described by Alessandro Sette's group [ 29 ] is in agreement with those defi ned by other approaches from the other groups [ 26 , 33 , 34 ] including Ole Lund's group, and is now widely accepted and has been used for development of peptide-based vaccines [ 29 , 35 , 36 ] . The structural composition between HLA-I and HLA-II molecules is fundamentally different, thus leading to very different binding characteristics. The binding groove is closed at both ends in an HLA-I molecule, while the peptide-binding groove of HLA-II molecules is open at both ends, which allow the binding of longer peptides (13-25 AA residues) than that for HLA-I molecules. A deeper understanding of the polymorphism of HLA-II molecules will contribute signifi cantly to HLA-II-binding peptide prediction and classifi cation of supertypes. In contrast to HLA-I supertypes, HLA-II supertypes have been less intensively studied, although a few studies about HLA-II supertypes [ 26 , 37 -41 ] have been reported. One important reason is that peptide binding data for HLA-II molecules is less available than those for HLA-I molecules due to the complexity of HLA-II structure. Nevertheless, studies have suggested that many DR molecules [ 26 , 37 , 38 ] and many DP molecules supported the existence of three main binding supertypes among HLA-DP molecules. In 2005, Doytchinova et al. [ 37 ] applied a combined bioinformatics approach using both protein sequence and structural data, to 2,225 HLA-II molecules, to detect similarities in their peptide-binding sites for defi nition of HLA-II supertypes. They defi ned 12 HLA-II supertypes, including fi ve DRs (DR1, DR3, DR4, DR5, and DR9), three DQs (DQ1, DQ2, and DQ3), and four DPs (DPw1, DPw2, DPw4, and DPw6). In 2011, Greenbaum et al. [ 41 ] determined the binding capacity of a large panel of non-redundant peptides for a set of 27 common HLA DR, DQ, and DP molecules. The measured binding data were then used to defi ne class II supertypes on the basis of shared binding repertoires. Seven different supertypes (main DR, DR4, DRB3, main DQ, DQ7, main DP, and DP2) were defi ned. Subsequently, according to motif-based supertype classifi cation [ 27 ] , seven different supertypes were defi ned after the analysis of 27 HLA II proteins described in a previous report [ 41 ] . All the molecules belonging to the DP genetic locus (DPB1*0101, DPB1*0201, DPB1*0401, DPB1*0402, DPB1*0501, and DPB1*1401) were grouped into a single supertype; DQ proteins were grouped into two different supertypes, each containing three HLAs: (DQB1*0301, DQB1*0302, DQB1*0401) and (DQB1*0201, DQB1*0501, DQB1*0602). The motif-based classifi cation of the DR proteins is less defi ned compared with the other loci. The HLA-DR can be grouped into four supertypes: (DRB1*0401, DRB1*0405, DRB1*0802, DRB1*1101), (DRB3*0101, DRB3*0202), (DRB1*0301, DRB1*1302), and the fourth containing the remaining DR proteins. Functional and motif-based clustering of 27 defi ned HLA-II molecules revealed the presence of proteins sharing both functional and structural properties, thus supporting the concept of HLA-II supertypes. To date, one of the major drawbacks of a peptide-based vaccine strategy is that the restricting HLA genes are extremely polymorphic resulting in a vast diversity of peptide-binding HLA specifi cities and a low population coverage for any given peptide-HLA specifi city. To increase population coverage, one might include defi ned epitopes for each HLA-I allele; however, this would lead to a vaccine comprising hundreds of peptides. As mentioned above, one way to reduce this complexity is to group HLA molecules into HLA supertypes; a classifi cation that as mentioned above refers to a group of HLA alleles with largely overlapping peptide binding specifi cities [ 24 , 25 , 30 ] . Ideally this means that a peptide, which binds to one allele within a supertype, has a high probability of binding to other allelic members of the same supertype. The concept of HLA supertypes has been successfully applied to characterize and identify T cell epitopes from a variety of different pathogens, including measles-mumps-rubella, SARS, EBV, HIV, HCV, HBV, HPV, infl uenza, LCMV, Lassa virus, F. tularensis , vaccinia, and cancer antigens as well [ 29 ] . HLA supertypes have been utilized as a component in several approaches and algorithms designed for predicting peptide candidates [ 43 -48 ] . The technology behind "reverse immunology" is developing rapidly in order to identify T cell epitopes from tumor antigens and infectious microorganisms [ 44 -51 ] . During the SARS epidemic back in 2003, the SARS genome was identifi ed in a matter of weeks, and a complete CTL epitope scanning-just barely possible at that time-was completed a few months later [ 43 ] . Therefore, "reverse immunology" as a powerful tool to identify T cell epitopes has now reached the stage where genome-, pathogen-, and HLA-wide scanning for HLA-binding antigenic epitopes become feasible at a scale and speed that makes it possible to exploit the genome information as fast as it can be generated. Importantly, a large-scale dataset of measured HLA-II-binding affi nities covering 26 allelic variants, including a total of 44541 affi nity measurements for HLA-DR alleles as well as 11 HLA-DP and DQ molecules [ 52 ] , are available to be used as training data for generating prediction tools utilizing several machine learning algorithms. To date, the computer-based algorithms for predicting peptides binding to HLA-I molecules are being developed for HLA-II-restricted peptide epitopes, a development, which is of pivotal importance for understanding the immune response and its effect on host-pathogen interactions [ 32 , 52 -55 ] . Those tools will defi nitely lead to fast identifi cation of novel peptides restricted by HLA-I and HLA-II supertypes for use in vaccines against infectious agents as well as tumors. In this respect, individual peptides harboring both HLA-I and HLA-II binding potentials [ 46 -48 , 56 ] might be of particular importance. In conclusion, classifi cation of HLA supertypes reduces complexity of HLA polymorphisms and has a signifi cant impact on the development of peptide-based vaccines with maximum population coverage. Since CD4 + T cells are required for priming of naïve CD8 + T cells as well as expansion of CD8 + memory T cells [ 8 -12 ] , it is of critical importance to incorporate both HLA-I and -II supertype-restricted epitopes in peptide-based vaccines with maximum population coverage to obtain participation of both CD4 + and CD8 + T cells for generation of strong and long-lasting immunity. Peptide-based vaccines for cancer: realizing their potential Adoptive cell therapy for the treatment of patients with metastatic melanoma Mycobacterium tuberculosis-specifi c CD8+ T cells require perforin to kill target cells and provide protection in vivo Mycobacterium tuberculosis-specifi c CD8+ T cells and their role in immunity T-Cell epitope discovery for variola and vaccinia viruses Cytotoxic T-cell immunity to infl uenza Transgenic mice lacking class I major histocompatibility complex-restricted T cells have delayed viral clearance and increased mortality after infl uenza virus challenge CD4+ T-cell help controls CD8+ T-cell memory via TRAIL-mediated activationinduced cell death CD4+ T cells are required for secondary expansion and memory in CD8+ T lymphocytes Requirement for CD4 T cell help in generating functional CD8 T cell memory Defective CD8 T cell memory following acute infection without CD4 T cell help CD4+ T cells are required for the maintenance, not programming, of memory CD8+ T cells after acute infection A binding site for the T-cell co-receptor CD8 on the alpha 3 domain of HLA-A2 Structure of the human class I histocompatibility antigen, HLA-A2 Antigen processing and presentation by the class I major histocompatibility complex The role of beta 2-microglobulin in peptide binding by class I molecules Beta 2-microglobulin restriction of antigen presentation Production of an antigenic peptide by insulin-degrading enzyme Mechanisms of MHC class I-restricted antigen processing and cross-presentation Antigen processing by the proteasome Predominant naturally processed peptides bound to HLA-DR1 are derived from MHCrelated molecules and are heterogeneous in size Sequence analysis of peptides bound to MHC class II molecules The exogenous pathway for antigen presentation on major histocompatibility complex class II and CD1 molecules Binding of a peptide antigen to multiple HLA alleles allows defi nition of an A2-like supertype Several HLA alleles share overlapping peptide specifi cities Defi nition of supertypes for HLA molecules using clustering of specifi city matrices Consensus classifi cation of human leukocyte antigen class II proteins Defi nition of an HLA-A3-like supermotif demonstrates the overlapping peptide-binding repertoires of common HLA molecules MHC-I-restricted epitopes conserved among variola and other related orthopoxviruses are recognized by T cells 30 years after vaccination CTL epitopes for infl uenza A including the H5N1 bird fl u; genome-, pathogen-, and HLA-wide screening HLA class I binding 9mer peptides from infl uenza A virus induce CD4 T cell responses Highaffi nity human leucocyte antigen class I binding variola-derived peptides induce CD4+ T cell responses more than 30 years post-vaccinia virus vaccination Identifi cation of MHC class II restricted T-cellmediated reactivity against MHC class I binding Mycobacterium tuberculosis peptides Reverse vaccinology: developing vaccines in the era of genomics Major histocompatibility complex class I binding predictions as a tool in epitope discovery Identifi cation of T-cell epitopes for cancer immunotherapy Peptide binding predictions for HLA DR, DP and DQ molecules Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan MHC class II epitope predictive algorithms NetMHCIIpan-2.0-Improved pan-specifi c HLA-DR predictions using a novel concurrent alignment and weight optimization training procedure HLA class II presentation of HLA class I binding antigenic 9mer peptides This work was supported by National Institute of Allergy and Infectious Disease contracts HHSN266200400083C, HHSN2662 00400025C, EU 6FP 503231, National Institutes of Health contract HHSN266200400081C, and a grant from the Lundbeck Foundation, Copenhagen, Denmark.