key: cord-0993257-gts5bexh authors: Blicher, Thomas; Kastrup, Jette Sandholm; Buus, Søren; Gajhede, Michael title: High‐resolution structure of HLA‐A*1101 in complex with SARS nucleocapsid peptide date: 2005-07-26 journal: Acta Crystallogr D Biol Crystallogr DOI: 10.1107/s0907444905013090 sha: 5240fc3611eff2b6080e05b69f1c635f43e16de6 doc_id: 993257 cord_uid: gts5bexh The structure of the human MHC‐I molecule HLA‐A*1101 in complex with a nonameric peptide (KTFPPTEPK) has been determined by X‐ray crystallography to 1.45 Å resolution. The peptide is derived from the SARS‐CoV nucleocapsid protein positions 362–370 (SNP(362–370)). It is conserved in all known isolates of SARS‐CoV and has been verified by in vitro peptide‐binding studies to be a good to intermediate binder to HLA‐A*0301 and HLA‐A*1101, with IC(50) values of 70 and 186 nM, respectively [Sylvester‐Hvid et al. (2004), Tissue Antigens, 63, 395–400]. In terms of the residues lining the peptide‐binding groove, the HLA‐A*1101–SNP(362–370) complex is very similar to other known structures of HLA‐­A*1101 and HLA‐A*6801. The SNP(362–370) peptide is held in place by 17 hydrogen bonds to the α‐chain residues and by nine water molecules which are also tightly bound in the peptide‐binding groove. Thr6 of the peptide (Thr6p) does not make efficient use of the middle (E) pocket. For vaccine development, there seems to be a potential for optimization targeted at this position. All residues except Thr2p and Lys9p are accessible for T‐cell recognition. The structure of the human MHC-I molecule HLA-A*1101 in complex with a nonameric peptide (KTFPPTEPK) has been determined by X-ray crystallography to 1.45 Å resolution. The peptide is derived from the SARS-CoV nucleocapsid protein positions [362] [363] [364] [365] [366] [367] [368] [369] [370] ). It is conserved in all known isolates of SARS-CoV and has been verified by in vitro peptide-binding studies to be a good to intermediate binder to HLA-A*0301 and HLA-A*1101, with IC 50 values of 70 and 186 nM, respectively , Tissue Antigens, 63, 395-400]. In terms of the residues lining the peptide-binding groove, the HLA-A*1101-SNP 362-370 complex is very similar to other known structures of HLA-A*1101 and HLA-A*6801. The SNP 362-370 peptide is held in place by 17 hydrogen bonds to the -chain residues and by nine water molecules which are also tightly bound in the peptide-binding groove. Thr6 of the peptide (Thr6p) does not make efficient use of the middle (E) pocket. For vaccine development, there seems to be a potential for optimization targeted at this position. All residues except Thr2p and Lys9p are accessible for T-cell recognition. Major histocompatibility complex class I (MHC-I) molecules play a pivotal role in the recognition of intracellular pathogens . They bind peptides derived from cytosolic proteins and transport them to the cell surface. This enables cytotoxic T lymphocytes (CTLs) to monitor the metabolic status of every nucleated cell in the body. Infected (or transformed) cells displaying foreign (or aberrant) peptides bound to their surface MHC-I molecules can be identified and subsequently lysed by CTLs, thus eliminating infections (or potentially cancerous cells; Townsend & Bodmer, 1989) . MHC-I molecules consist of three chains: the polymorphic transmembrane -chain (365 amino acids), the monomorphic -chain ( 2 m, 99 amino acids) and a peptide of 8-11 aminoacid residues. Assembly of the complexes takes place in the ER, where the process is controlled by a number of chaperones and the TAP peptide transporter (Cresswell, 2000) . The extracellular part of MHC-I molecules has a very characteristic structure. The membrane-proximal 2 m/ 3 immunoglobulin domains serve as a rigid support for the membranedistal 1 / 2 -domains consisting of two -helices on top of a -sheet. Here, peptides are bound to MHC-I molecules between the two -helices through interactions with polymorphic as well as conserved residues in the -helices and the -sheet floor. Several invariant interactions with the peptide backbone, especially the N-and C-terminus, stabilize the peptide-MHC-I complex (Bjorkman et al., 1987) , but the ultimate determinant of the half-life of MHC-I complexes seems to be sequence-dependent interactions with the socalled specificity pockets in the groove. The peptide-binding groove is conventionally subdivided into six such specificity pockets named A to F (Garrett et al., 1989) . Each pocket can accommodate the side chain of one or a few closely related amino-acid residues, thus determining the peptide-specificity of the MHC-I molecule. The peptide residues preferred by a given MHC-I molecule constitute what is known as a sequence motif. Variation in the pool of bound peptides not defined by sequence motifs includes attachment of non-peptide groups to peptides (Otvos et al., 1996; Speir et al., 1999) as well as binding of extended or shortened peptides (Stryhn et al., 2000) . Currently, about 70% of all peptides binding to a given MHC-I molecule can be identified using extended motif definitions (Kast et al., 1994) . Although each MHC-I allele has distinct peptide-binding properties, it has been suggested that most MHC-I alleles can be clustered into a few supertypes represented in all ethnic groups. So far, a comparison of sequence motifs has allowed the extraction of 9-12 Sette & Sidney, 1999) supertypes out of the 975 different MHC-I A and B alleles (January 2005; see http://www.anthonynolan.org.uk/HIG/ index.html) and more than 99% of the world population carry at least one of these supertypes (Sette & Sidney, 1999) . Hence, it seems that the number of alleles that have to be considered can be reduced. This may represent a significant simplification of the requirements for the design of peptide-based vaccines with broad population coverage. HLA-A*1101 (A11) is one of the most frequently occurring MHC-I alleles, with an allele frequency of up to 27% in some Southeast Asian populations (Bodmer et al., 1999) . It is known to be important in the control of HIV (Culmann et al., 1991; Johnson & Walker, 1991; Walker et al., 1989) , EBV and HBV infections (Achour et al., 1986) . A11 belongs to the A3 supertype, which is the second most common supertype; at least 44% of the world population carry A3 supertype MHC-I molecules (Sette & Sidney, 1999) . A3 supertype molecules are characterized by a preference for small and/or hydrophobic residues in the second position of the peptide and arginine or lysine at the C-terminus (Sidney et al., 1996) . Specifically, A11 preferentially binds peptides with isoleucine, threonine or valine in position 2 and a C-terminal lysine or arginine residue (Falk et al., 1994; Gavioli et al., 1995; Kubo et al., 1994; Zhang et al., 1993) . Structures of MHC-I molecules belonging to the A3 supertype improve our understanding of peptide binding to these molecules. Furthermore, such models will enable the construction of reliable A3 supertype homology models with any desired peptide, an approach which promises to become a useful way of predicting peptide-MHC-I affinities in the near future (Flower et al., 2003; Logean et al., 2001; Rognan et al., 1999; Schueler-Furman et al., 2000) . Although A3 is one of the most common supertypes, only a few structures belonging to the A3 family have been published to date. Three structures of HLA-A*6801 (A68) exist. In one, the peptide identity is unknown (PDB code 2hla; resolution 2.6 Å ; Garrett et al., 1989) , in another the peptide is partially missing (PDB code 1hsb; resolution 1.9 Å ; Guo et al., 1992) and in the third one the 3 domain was proteolytically removed prior to crystallization (PDb code 1tmc; resolution 2.3 Å ; Collins et al., 1995) . In addition, two complete structures of HLA-A*1101 have been published recently (Li & Bouvier, 2004; PDB codes 1q94 and 1qvo; resolution 2.4 and 2.22 Å , respectively) . In this study, we describe the first highresolution X-ray structure of HLA-A*1101 in complex with a peptide derived from SARS coronavirus (CoV) nucleocapsid protein (see below). This peptide has been suggested as a putative vaccine candidate . The first known case of Severe Acute Respiratory Syndrome (SARS) occurred in the Guandong Province in southeast China in November 2002 (Cherry, 2004) . At the beginning of 2003, a large number of cases appeared and SARS quickly spread to more than 30 countries all over the world. By March 2003, the causative agent had been identified as a novel member of the coronavirus family (order Nidovirales, family Coronaviridae, genus Coronavirus). This class of viruses is responsible for up to 30% of all mild upper respiratory tract illnesses in humans (Siddell, 1995) . To date, over 8400 cases of SARS CoV-infected patients have been identified worldwide. Approximately 10% of all infected individuals died from progressive respiratory failure (Centers for Disease Control and Prevention, 2003) and in elderly people the mortality was as high as 55%. Since the initial outbreak, the number of reported cases has dropped significantly; however, SARS-CoV is far from eradicated. Of special concern is the existence of SARS-CoV in several non-human species. Indeed, evidence suggests a non-human origin of SARS-CoV (reviewed in Poon et al., 2004) . The consequences of a worldwide SARS pandemic are grim and the World Health Organization still considers the development of a vaccine against SARS-CoV to be of great importance. 1.1.1. The N protein. The product of the SARS N gene, the nucleocapsid or N protein, is a protein of 46 kDa (422 aminoacid residues). It interacts with the viral RNA and presumably forms a helical arrangement in the centre of the virus particle as do N proteins from other coronaviruses (Siddell, 1995) . Furthermore, the N protein has been shown to induce actin rearrangement and apoptosis in cells under stress (Surjit, Liu, Jameel et al., 2004) . A highly basic sequence unique to SARS-CoV in the C-terminal part of the protein (residues 362-381, KTFPPTEPKKDKKKKTDEAQ) resembles a nuclear localization signal (Marra et al., 2003) . Although its function remains unknown, the high pI of the peptide suggests binding to RNA. In addition, the C-terminal half of the protein is known to contain a dimerization domain (Surjit, Liu, Kumar et al., 2004) , the function of which remains to be elucidated. research papers 2. Materials and methods A*1101 -chain was cloned from the B cell line KHAGNI (European Collection of Cell Cultures) grown in RPMI with 10% FCS and 1% penicillin/streptomycin at 310 K, 5.0% CO 2 . mRNA was extracted using a combination of two mRNApurification kits: the RNA Isolation Kit from Stratagene and the RNeasy kit from Qiagen. The mRNA was reversetranscribed with the GeneAmp RNA PCR kit from Perkin-Elmer and the DNA encoding the 275 N-terminal (extracellular) amino acids of the A*1101 -chain was amplified using Vent R DNA polymerase (New England Biolabs) and primers purchased from Hobolth DNA Syntese (Allerød, Denmark). The amplified DNA was purified (QIAquick PCR Purification kit) and cut with XhoI and NcoI restriction enzymes (New England Biolabs). Digested A*1101 DNA was ligated into a pET-28a vector using T4 DNA ligase (New England Biolabs) and transformed into chemically competent Escherichia coli BL21-DE3. The correctness of the DNA was verified by sequencing. Recombinant human 2 m and A*1101 -chain were generated according to protocols described in previous work (Sidney et al., 1996) . Briefly, 2 m and A*1101 -chain were produced in E. coli as inclusion bodies. Following resolubilization of the inclusion bodies in 8.0 M urea, the crude protein was purified by size-exclusion and ion-exchange chromatography. 2 m was further purified and refolded on an Nichelating column by virtue of an N-terminal His 6 tag. Prior to use, the His 6 tag was removed by cleavage with factor Xa followed by size-exclusion chromatography. This 2 m is indistinguishable from 2 m purified from natural sources (Pedersen et al., 1995) . Folding of HLA-A*1101 complexes was performed by rapid 100-fold dilution of A11 solution (approximately 200 mM/ 0.6 mg ml À1 ) into a folding buffer containing 50 mM Tris-HCl pH 7.5, 150 mM NaCl, 3 mM EDTA, 1.0 mM 2 m and 2.0 mM SARS-CoV nucleocapsid peptide (residues 362-370). The folding solution was left at 291 K for 16-24 h and then concentrated to 1/20 of its original volume. After another 16-24 h at 277 K, misfolded protein was removed by centrifugation at 5000g for 30 min at 277 K. The supernatant was further concentrated to 1/20 of its original volume, left for another 16-24 h and finally centrifuged at 20 000g for 20 min at 277 K. The supernatant was applied onto a Superdex 75 column and eluted with a buffer containing 150 mM NaCl and 100 mM MES pH 6.5. Fractions containing the HLA-A*1101 complexes were concentrated after buffer exchange (0.1 M MES pH 6.5) to 6.5 mg ml À1 . The complex was purified to more than 95% purity (estimated by SDS-PAGE; data not shown). The SARS peptide stock used in the preparation of the A11 complexes was purchased from Schafer-N (Copenhagen) as >99% HPLC-purified preparations. The peptide was synthesized using FMOC chemistry. Crystals of HLA-A*1101 in complex with the SNP 362-370 peptide were prepared using the hanging-drop technique. Crystallization experiments were set up at 293 K with individual drops consisting of 2 ml protein solution (6 mg ml À1 in 0.1 M MES pH 6.5) and 2 ml reservoir solution. The crystals grew from a crystallization buffer containing 30% PEG 5000 MME, 0.2 M ammonium sulfate (Crystal Screen 2, Hampton Research), appearing as clusters of monoclinic plates after four weeks. These initial crystals were used to streak-seed new drops and produced single crystals of final dimensions 0.2 Â 0.1 Â 0.05 mm in three weeks. Single crystals were collected, soaked in reservoir solution containing 5% glycerol as a cryoprotectant and flash-frozen in liquid nitrogen. Two data sets (36.0-2.54 and 15.0-1.45 Å resolution) were collected on the same crystal at 100 K at a wavelength of 0.9815 Å and were processed with the HKL suite (Otwinowski & Minor, 1997) , giving the data statistics outlined in Table 1 . Data were collected at beamline 14-2 at BESSY (Berlin, Germany) using a MAR345 image-plate detector. The space group is P2 1 , with unit-cell parameters a = 58.5, b = 80.6, c = 56.6 Å , = 116.8 and one complex per asymmetric unit. The structure of the A11-SNP 362-370 complex was solved by molecular replacement (MR) in AMoRe (Navaza, 1994) using the structure of HLA-A*6801 without peptide and water molecules (PDB code 1hsb; Guo et al., 1992 ) as a search model. The program ARP/wARP was used to remove bias from the MR model and subsequently managed to unambiguously exchange the polymorphic side chains and to build the peptide. All subsequent refinements were carried out } R free as R cryst but calculated using 5% of the data excluded from refinement. † † Calculated with PROCHECK (Laskowski et al., 1993) . with SHELXL (Sheldrick & Schneider, 1997) , with 5% of the data reserved for the calculation of R free . The model after ARP/wARP rebuilding included all 275 residues of the -chain, all 99 residues of 2 m, 171 water molecules and the peptide. The R value was 22.4% and R free was 25.8%. Following the addition of 204 water molecules, minor side-chain adjustments using program O (Jones et al., 1991) and a round of positional and B-factor refinement, R and R free fell to 20.4 and 24.2%, respectively. Introduction of another 210 water molecules (now 585 in total) along with seven glycerol molecules gave R and R free values of 19.5 and 23.6%, respectively. The resolution of the data allowed all atoms to be refined anisotropically. After anisotropic B-factor refinement and deletion of 28 water molecules, R and R free were 16.1 and 21.3%, respectively. Subsequent addition of riding H atoms resulted in R and R free values of 14.1 and 19.0%, respectively. Cleaning up the water molecules and a few more rounds of positional and B-factor refinement gave a final R factor of 13.3% and an R free of 18.1% (see Table 1 ). No B-factor cutoff was used during the last rounds of water refinement. Rather, all water molecules with clear 2F o À F c density at 1 and potential hydrogen-bonding partners were retained. The following 20 residues were modelled with double conformations: Ser4, Ser11, Arg35, Arg82, Asn86, Ser88, Met98, Thr134, Gln141, Arg145, Gln156, Arg157, Thr216, Lys243, Ser251, Gln255, Thr4, Ser20, Val27 and Pro4p. There are no direct contacts between the peptide and neighbouring molecules in the crystal. The closest contacts arise from packing of 2 m from a neighbouring molecule against the outside of the middle part of the 1 helix. The SARS-CoV N protein peptide (KTFPPTEPK; SNP 362-370 ) was crystallized in complex with HLA-A*1101. The structure was determined at 1.45 Å resolution to an R factor of 13.3% (R free = 18.1%). The final model contains all 275 residues of the extracellular domain of A11 (-chain), all 99 residues of 2 m (-chain), the SNP 362-370 peptide (p chain), seven glycerol molecules, three partially occupied sulfate ions (65, 57, 62%) and 574 water molecules (see Fig. 1 for an overview). The Ramachandran plot calculated with PROCHECK (Laskowski et al., 1993) shows 94.0% of all residues in the most favoured regions, 5.7% in additional allowed regions and only Asp29 of the -chain in a generously allowed region. The unusual but well defined conformation of Asp29 has been reported for other MHC-I molecules (Hillig et al., 2001) . A comparison of the A11-SNP 362-370 structure with the A11-Nef and A11-RT structures (Li & Bouvier, 2004 ) reveals very similar overall conformations, with only minor variations in the angles between the domains. These minor variations reflect the inherent flexibility of MHC-I molecules as well as differences in crystal packing. Despite differences in the identity of the bound peptide, most residues lining the peptide-binding groove are in almost identical positions (Fig. 2) . The only differences are seen at the edge of the binding groove, where side chains from the peptide allow different sets of interactions with otherwise solvent-exposed residues of the -chain. However, the contribution of these mainly polar interactions to the peptide binding is probably negligible, as they are easily replaced with solvent interactions. Minor local displacements of the 1 and 2 helices caused by the peptide can also be discerned and are of the order of 0.5-1.0 Å . These displacements centre on the middle part of the 1 helix (B/C pockets) and the N-terminal half of the 2 helix (D/E pockets). A comparison of the A11-SNP 362-370 structure with the three existing A68 structures shows the same trends as the comparison with existing A11 structures. Those residues in the binding groove that differ between the two alleles (A11$A68: Q62R, E63N, I97M, A152V, Q156W, R163T) occupy approximately the same space. The only significant difference is found at the bottom of the central part of the binding groove, where A68 residues Met97 and Trp156, which are sitting opposite each other, are more bulky than the corresponding A11 residues Ile97 and Gln156. The A11$A68 change in position 156 to a less polar environment (Q$W) in addition to the increase in size makes the outermost part of the side chain of the shared Arg114 flip over, which in turn forces the side chain of Asp116 to adopt another conformation. These differences affect the shape and charge geometry at the bottom of the peptide-binding groove (F pocket) and might explain why A68 has a slight preference for arginine over lysine at the peptide C-terminal end, while the opposite is true for A11 (see the SYFPEITHI database at http://www.syfpeithi.de/; Rammensee et al., 1999) . The long side chain of an arginine residue requires a deeper and wider pocket in order to maintain the conserved network of hydrogen bonds surrounding the peptide C-terminus and this is achieved by substituting Ile97 in A11 with methionine in A68. Although other residue differences do exist between A11 and A68 in the 1 2 region, they are all found on the outside far away from the peptide and are not likely to be important for peptide binding. In general, the positions of the peptide N-and C-termini are independent of the peptide length. Variations in length are accommodated in the central part of the peptide-binding groove where the peptide adjusts itself to allow optimal interactions between the primary anchor residues (P 2 and P C for the A3 supertype) and the corresponding peptide-binding pockets (B and F, respectively). Short peptides (octamer) optimize their interactions with the B and F pockets by adopting an extended conformation (Reiser et al., 2000 (Reiser et al., , 2002 compared with nonamer or decamer peptides, which adopt various forms of bulging or zigzagging conformations in the central part of the groove (Hillig et al., 2001) . The straight conformation tends to place the octameric peptides deeper in the groove. Thus, owing to steric interference caused by Met97 and Trp156 at the floor of the peptide-binding groove, A68 would be expected to have a stronger preference for nonamers or decamers than A11. The set of A11 and A68 example peptides in the SYFPEITHI database confirms this tendency (Rammensee et al., 1999) . The other members of the A3 supertype would be expected to have the same preferences as A11 as they all (A3, A31 and A33) have a leucine residue in position 156 and either isoleucine (A3 and A11) or methionine (A31 and A33) in position 97. Comparison of the peptide-binding grooves of three A11 structures. The A11-SNP 362-370 structure is shown in red, A11-Nef in blue and A11-RT in green (Li & Bouvier, 2004) . Residues facing the peptide-binding groove are in almost identical positions or adopt similar rotamer conformations. Peptide conformations. Superposition of four peptides bound to A3 supertype MHC-I molecules, N-termini on the left. The nonameric SNP 362-370 peptide is shown in red, the decameric Nef (A11; Li & Bouvier, 2004) in blue, the nonameric RT (A11; Li & Bouvier, 2004) in green and an endogenously derived decameric peptide (A68; Collins et al., 1995) A superposition of the A11 structure with that of other MHC-I complexes demonstrates that the SNP 362-370 peptide adopts a standard nonameric conformation with the canonical positioning of the N-and C-terminal residues (see Fig. 3 ). The interactions between the SNP 362-370 peptide and the A11 binding groove are summarized in Table 2 and shown in Fig. 4 . The SNP 362-370 peptide is held in place by 17 hydrogen bonds to the -chain residues and by additional hydrogen bonds to nine water molecules which are also tightly bound in the peptide-binding groove. A total of 13 water molecules occupy key positions in the groove, being either directly involved in peptide binding or contributing to the specific architecture of the various binding pockets. Seven of the water molecules (W19, W46, W87, W152, W159, W178 and W283) form a hydrogen-bonded network starting at the C pocket and ending at the amino group of the C-terminal lysine residue located in the F pocket. A detailed description of the peptide-binding interactions is given below. 3.2.1. The A pocket. The backbone N atom of Lys1p forms the characteristic hydrogen bonds to Tyr7 and Tyr171 and the N-terminal carbonyl O atom makes a hydrogen bond to Tyr159. The aliphatic part of the Lys1p side chain is engaged in van der Waals (vdW) interactions with the side chains of Tyr59 and Trp167, and the distal amino group (N ) forms hydrogen bonds to Gln62 as well as to a water molecule (W421) and a sulfate ion (4003). A conserved water molecule (W17) stabilizes the A pocket by hydrogen-bond formation to the carboxyl group of Glu63 and to the hydroxyl groups of Tyr7 and Tyr59. The latter in turn forms a hydrogen bond to the hydroxyl group of Tyr171. 3.2.2. The B pocket. Thr2p is fixed at the entrance to the B pocket via hydrogen bonds from the hydroxyl group to the side chains of Glu63 and Asn66. Another hydrogen bond from Glu63 to the Thr2p amide NH group, in addition to vdW interactions with the side chains of Tyr7, Tyr9, Asn66 and Tyr99, also at the pocket entrance, further restricts the flexibility of Thr2p. The architecture of the entrance to the B pocket seems uniquely suited to accommodate amino-acid residues branched at C . Indeed, A11 prefers peptides with isoleucine, valine and threonine in the second position (Gavioli et al., 1995) , but other aliphatic residues such as methionine, leucine, alanine and serine are also accepted (Falk et al., 1994; Sidney et al., 1996) . Met45 and Val67 define the bottom of the B pocket and since Thr2p does not fill the pocket entirely, these residues would have to rearrange slightly to accommodate larger side chains. Opposite the B pocket, Thr2p rests against Tyr159 and forms a hydrogen bond to the Arg163 side chain via the carbonyl group. Finally, a water molecule (W6) forms hydrogen bonds to the side chains of Glu63, Asn66 and Arg163 as well as to the Thr2p carbonyl group. Together, these interactions completely bury Thr2p in the peptide-binding groove. 3.2.3. The C pocket. Pro4p and Pro5p are bound in a similar solvent-exposed manner: Pro4p in the region between the B and C pockets formed by the 1 helix and Pro5p in the C pocket. Both residues stack against the face of a side-chain amide group (Asn66 and Gln70, respectively). Hydrogen bonds are seen between their carbonyl groups and bound water molecules (Pro4p to W296, Pro5p to W134 and W136 and water molecules W46 and W283 to both). Pro5p makes additional vdW interactions with Ala69, Thr73 and the backbone of Gln70. The open nature of the C pocket allows a broad range of residues to be bound. 3.2.4. The D pocket. The D pocket is wide and imposes few restrictions on the nature of the peptide side chain in position 3. Together with the C pocket, it forms the central open part of the peptide-binding groove. While two tyrosine residues (Tyr99 and Tyr159) allow interactions with aromatic and hydrophobic residues including methionine, the presence of Arg114 and Gln156 at the peptide-distal end of the pocket explains how acidic residues and their derivatives (Asp, Glu, Asn and Gln) are also accepted (Falk et al., 1994) . The amide NH group of Phe3p forms a hydrogen bond with the hydroxyl group of Tyr99 and the carbonyl group forms hydrogen bonds to two water molecules (W46 and W283), which are themselves caught below the peptide in the binding groove. The Phe3p side chain makes vdW interactions with Tyr159 and Gln156, the latter assuming two clearly visible conformations. Since Phe3p does not make use of the hydrogen-bonding abilities of Gln156, a water molecule (W178) is inserted between Arg114, Gln156 and W283, pushing the phenyl ring of Phe3p up into a more superficial position close to Pro4p. This means that although the Phe3p backbone is hydrogen-bonded to the peptide-binding groove, the side chain rests on a cushion of water molecules (W178 and W283) and polar residues. The resulting relatively high mobility (B factors of 27 Å 2 for C increasing to 40 Å 2 for C ) of the Phe3p side chain indicates that the interactions between Phe3p and the D pocket are rather weak. 3.2.5. The E pocket. The side chain of Thr6p points downwards into the groove of the E pocket, where its hydroxyl group forms a hydrogen bond to a water molecule (W391). The side-chain methyl group is in contact with Arg114 and Trp147. W391 forms additional hydrogen bonds with the Ala152 amide carbonyl group and one of the two side-chain rotamers of Gln156. Residual electron density in the vicinity of the Thr6p hydroxyl group and W391 suggests the presence of a disordered water molecule (possibly W391), in agreement with the relatively high B factors of the Thr6p side chain (B factors between 24 and 38 Å 2 ). This disorder, which is not explicitly modelled, might also explain the relatively short hydrogen-bonding distance between W391 and Thr6p as it represents the distance between average positions. The backbone NH of Thr6p forms a hydrogen bond with W159 in the binding groove, which is itself hydrogen bonded to W283 and W152 on opposite sides, also in the groove. The Thr6p carbonyl group only binds surface water molecules. Thr6p seems not to be directly involved in any specific interactions essential for binding of the SNP 362-370 peptide. Rather, it merely helps to fill the peptide-binding groove, where a moderately sized polar residue such as asparagine or glutamine would probably be able to make more favourable research papers interactions with the side chains of Arg114, Trp147 and Gln156 and the backbone of Ala152. In the two other published structures of A11, either a serine (AIFQSSMTK) or methionine residue (QVPLRPMTYK) fills part of the E pocket from the P C-3 position (see also Fig. 3c ). In analogy with the C pocket, the E pocket also accepts a rather broad range of amino acids owing to its open nature and residues of inadequate size are supplemented with water molecules. The most solvent-exposed residue of the entire peptide is Glu7p. The backbone NH group approaches the Thr6p hydroxyl group to form a weak hydrogen bond and the backbone carbonyl group forms hydrogen bonds with the indole NH of Trp147 and a water molecule (W87). The side chain makes vdW interactions with the side chains of Trp147 and Ala150 as well as with Pro8p. A water molecule binds between the carbonyl group of Pro8p and the carboxylate group of Glu7p. Pro8p is bound in a manner similar to that of Pro4p and Pro5p, with its carbonyl group forming a hydrogen bond to a surface water molecule (W300). The rest of the residue makes no direct interactions with the peptide-binding groove. The absence of restraints on the side chain allows Pro8p to switch between two conformations. The carbonyl group also forms a very weak hydrogen bond with the Trp147 indole NH group. In fact, the indole NH group is well outside the amide plane of the Glu7p-Pro8p amide bond and seems to interact with the amide-bond electrons. A similar interaction is seen between the Trp147 indole NH group and the carbonyl group of the Thr6p-Glu7p amide group. However, this bifurcated hydrogen bond is far from ideal hydrogen-bond geometry and its contribution to peptide binding is probably correspondingly low. 3.2.6. The F pocket. The prime determinant of peptide binding is the interaction between the C-terminal residue and the F pocket, which in the case of A11 is tailored to fit lysine (and arginine) residues. This is evident from the large number of interactions to the backbone and the side chain of Lys9p, which fits in between the negatively charged residues Asp74, Asp77 and Asp116. The carboxylate group of Asp77 forms a direct hydrogen bond to the Lys9p amide NH group and an indirect bond to the carboxylate and N groups through water molecules W16 and W19, respectively. W16, which also makes a hydrogen bond to the Thr80 hydroxyl group, is not merely a surface-bound water molecule but an integral part of the F pocket and its B factor is as low as that of the Lys9p carboxylate ($14 Å 2 ). The carboxylate of Lys9p also forms a number of direct hydrogen bonds to residues Tyr84, Thr143 and Lys146 of the peptide-binding groove. The N group is engaged in hydrogen bonding to Asp116 and two water molecules, W19 and W121, the first forming part of a hydrogen-bonded string of water molecules which stretches all the way back to Phe3p in the C pocket. W19 and W121 are further bound to the side chain of Asp74 and the carbonyl group of Ala117, respectively. Finally, the aliphatic part of Lys9p makes vdW interactions with Asp77, Leu81, Ile95, Tyr123 and Trp147. Interestingly, the C " -N part is the most flexible part of Lys9p (C " , 22.7 Å 2 ; N , 22.3 Å 2 ; the remainder, 12-15 Å 2 ), despite the large number of interactions holding the side chain. This is also evident from a slight fanlike smearing of the corresponding electron density. In principle, all residues except Thr2p and Lys9p are directly accessible for T-cell recognition (Fig. 5) . However, the side chains of Phe3p and Thr6p are so deeply placed in the binding groove that their interaction with the T-cell receptor (TCR) might be limited. The residues in those positions could likely be optimized for A11 binding without severely affecting TCR binding. The large number of proline residues is probably responsible for the intermediate affinity of the SNP 362-370 peptide for two reasons. Firstly, they do not contribute any significant hydrogen bonds in the peptide-binding groove. Rather, water molecules are present in the central part of the binding groove to satisfy the polar residues. Secondly, they restrict the flexibility of the peptide, thereby preventing optimization of interactions with the binding groove. This is evident from the increased mobility of the Lys9p side chain, which is usually the most tightly bound part of the peptide (Li & Bouvier, 2004) . In this case, however, the proline residues cannot be exchanged, since they are all solventexposed and therefore expected to interact directly with the TCR. In recently published work, the best MHC-I-binding peptides from the SARS-CoV were identified using an artificial neural network approach . In addition to MHC-I binding, peptides were selected such that they had a high probability for being generated by proteasomal processing and subsequently translocated into the ER by TAP. The predicted peptides were tested for MHC-I binding in a biochemical binding assay. From this set of verified binders, we selected a peptide with the sequence KTFPPTEPK (the SNP 362-370 peptide) corresponding to a conserved putative nuclear localization signal (see above). This peptide has an IC 50 ($K d ) value of 70 nM for binding to HLA-A*1101 and 186 nM for binding to HLA-A*0301, placing it in the range of intermediate MHC-I binders. As the SNP 362-370 peptide is part of a SARS-specific sequence from the N protein which seems to be involved in several functions, it is likely that it is and will remain well conserved among different SARS-CoV strains. Therefore, the SNP 362-370 peptide is a potential candidate for inclusion in a peptide-based vaccine against SARS. Based on a comparison of several A3 supertype MHC-I molecules, the SNP 362-370 peptide is expected to bind to other members of the A3 supertype, as the differences between these appear not to interfere with the SNP 362-370 binding interactions. Previously published structures of A11 (Li & Bouvier, 2004) and A68 (Collins et al., 1995) have suggested the use of additional anchor residues in the P C-3 position, resulting in a double-bulge conformation of the Peptide-binding pockets shown in stereo. The SNP 362-370 peptide is shown in red, the -chain in blue and water molecules involved in peptide binding in purple. Hydrogen bonds between the peptide and the -chain are shown as dashed orange lines, while hydrogen bonds involving water molecules are green. The central part of the peptide is highlighted in bold stick representation in each panel. central part of the peptide (see Fig. 3 ). In the A11-SNP 362-370 complex, Thr6p does not make efficient use of this middle (E) pocket and hence there seems to be a potential for optimization targeted at the P 6 position. Ideally, Thr6p should be replaced by a residue which could displace loosely bound water molecules in the groove and make direct contacts with residues lining the D pocket. The Human MHC Project (Buus, 1999) is an effort to design peptide-based vaccines against any pathogen by means of a complete mapping of MHC-based immune reactivity. As direct screening of every allele is an insurmountable task, there is a need for alternative methods to extend our current knowledge. A promising strategy involves the use of known MHC-I structures as templates for modelling of any desired peptide-MHC-I complex followed by estimation of the interaction energy (i.e. affinity) for the complex (Flower et al., 2003) . An encouraging fact in this context is the observation of the structural invariance of the peptide-binding groove within a supertype. Although the bound peptides are very different, the A11-SNP 362-370 complex is very similar to the other two A11 structures in terms of positions of the residues lining the peptide-binding groove. Even A68 is so similar that it should be possible to model it (and presumably A03, A31 and A33 as well) reliably based on the A11 structures. Extrapolating this to the other MHC-I supertypes means that a few well determined structures belonging to each supertype should suffice as a reliable basis for the modelling of any desired MHC-Ipeptide complex. The A11-SNP 362-370 complex structure presented here adds to this expanding list of MHC-I template structures, which hold the information necessary for a direct structural approach to peptide-based vaccine design. The HLA-A*1101-SNP 362-370 peptide complex as seen from the T-cell perspective (peptide N-terminus on the left). The -chain is shown in a blue surface representation and the peptide as red sticks. The surface is drawn with a 1.4 Å probe over the -chain without water molecules. Water molecules involved in peptide binding are shown in purple. Notice the prominent proline residues in the peptide. All residues except Thr2p and Lys9p are accessible for direct interaction with the TCR. Proceedings of the Twelfth International Histocompatibility Workshop and Conference Proc. Natl Acad. Sci. USA The PyMOL Molecular Graphics System Immunogenetics The Coronaviridae Proc. Natl Acad. Sci. USA The authors would like to thank Ole Kristensen for assisting with data collection. This work was supported by the Danish Medical Research Council (grant 9601615 and 22-01-0272), the Fifth Framework program of the European Commission (grant QLRT-1999-00173), the NIH (grant AI49213-02) and the Danish National Research Foundation project DANSYNC.