key: cord-0799602-2sfqsfm1
authors: Anand, Kanchan; Palm, Gottfried J.; Mesters, Jeroen R.; Siddell, Stuart G.; Ziebuhr, John; Hilgenfeld, Rolf
title: Structure of coronavirus main proteinase reveals combination of a chymotrypsin fold with an extra α-helical domain
date: 2002-07-01
journal: The EMBO Journal
DOI: 10.1093/emboj/cdf327
sha: cf584e00f637cbd8f1bb35f3f09f5ed07b71aeb0
doc_id: 799602
cord_uid: 2sfqsfm1

The key enzyme in coronavirus polyprotein processing is the viral main proteinase, M(pro), a protein with extremely low sequence similarity to other viral and cellular proteinases. Here, the crystal structure of the 33.1 kDa transmissible gastroenteritis (corona)virus M(pro) is reported. The structure was refined to 1.96 Å resolution and revealed three dimers in the asymmetric unit. The mutual arrangement of the protomers in each of the dimers suggests that M(pro) self-processing occurs in trans. The active site, comprised of Cys144 and His41, is part of a chymotrypsin-like fold that is connected by a 16 residue loop to an extra domain featuring a novel α-helical fold. Molecular modelling and mutagenesis data implicate the loop in substrate binding and elucidate S1 and S2 subsites suitable to accommodate the side chains of the P1 glutamine and P2 leucine residues of M(pro) substrates. Interactions involving the N-terminus and the α-helical domain stabilize the loop in the orientation required for trans-cleavage activity. The study illustrates that RNA viruses have evolved unprecedented variations of the classical chymotrypsin fold.

Transmissible gastroenteritis virus (TGEV) belongs to the Coronaviridae, a family of positive-strand RNA viruses. Coronaviruses have the largest RNA viral genomes known to date (28 500 nucleotides in the case of TGEV) and share a similar genome organization and common transcriptional and translational strategies with the Arteriviridae (den Boon et al., 1991; Cavanagh, 1997) . TGEV infection is associated with severe and often fatal diarrhoea in young pigs (for reviews see Enjuanes and van der Zeijst, 1995; Saif and Wesley, 1999) .

The viral proteins required for TGEV genome replication and transcription are encoded by the replicase gene (Eleouet et al., 1995; Penzes et al., 2001) . This gene encodes two replicative polyproteins, pp1a (447 kDa) and pp1ab (754 kDa) that are processed by virus-encoded proteinases to produce the functional subunits of the replication complex (reviewed in Ziebuhr et al., 2000) .

The central and C-proximal regions of pp1a and pp1ab are processed by a 33.1 kDa viral cysteine proteinase which is called the`main proteinase' (M pro ) or, alternatively, thè 3C-like proteinase' (3CL pro ). The name`3C-like proteinase' was introduced originally because of similar substrate speci®cities of the coronavirus M pro and picornavirus 3C proteinases (3C pro ) and the identi®cation of cysteine as the principal catalytic residue in the context of a predicted two-b-barrel fold (Gorbalenya et al., 1989a,b) . Meanwhile however, several studies have revealed signi®cant differences in both the active sites and domain structures between the coronavirus and picornavirus enzymes (Liu and Brown, 1995; Lu and Denison, 1997; Ziebuhr et al., 1997 Ziebuhr et al., , 2000 . Also, the crystal structures reported for a number of picornavirus 3C proteinases (Allaire et al., 1994; Matthews et al., 1994; Bergmann et al., 1997; Mosimann et al., 1997) have not been useful in predicting the three-dimensional structures of coronavirus main proteinases. Because of the large phylogenetic distance between the two groups of enzymes, we will use the term coronavirus M pro throughout this article.

Sequence comparisons ( Figure 1 ) and experimental data obtained for other coronavirus homologues allow us to predict that the mature form of the TGEV M pro is released from pp1a and pp1ab by autoproteolytic cleavage at anking Gln¯(Ser,Ala) sites (Eleouet et al., 1995; . Accordingly, the TGEV M pro has 302 amino acid residues that correspond to the pp1a/pp1ab residues 2879±3180. In vivo and in vitro analyses of avian infectious bronchitis virus (IBV), mouse hepatitis virus (MHV) and human coronavirus 229E (HCoV 229E) M pro activities have shown consistently that the proteinase cleaves the replicase polyproteins at 11 conserved sites and, therefore, it seems reasonable to conclude that the M pro -mediated processing pathways are conserved in all coronaviruses, including TGEV.

Previous theoretical studies and experimental data have led to the following conclusions (Bazan and Fletterick, 1988; Gorbalenya et al., 1989a,b; Liu and Brown, 1995; Lu et al., 1995; Ziebuhr et al., 1995 Ziebuhr et al., , 1997 Ziebuhr et al., , 2000 Lu and Denison, 1997; Seybert et al., 1997; Ziebuhr and Siddell, 1999; Ng and Liu, 2000; : (i) Coronavirus main proteinases employ conserved cysteine and histidine residues in the catalytic site. In TGEV M pro , these are Cys144 and His41. There has been some debate on the existence of a third residue in the catalytic centre. In common with picornavirus 3C proteinases, the catalytic centre of the coronavirus M pro is predicted to be embedded in a chymotrypsin-like, two-b-barrel structure in which cysteine (rather than serine) serves as the principal nucleophile. (ii) Coronavirus main proteinases have well-de®ned substrate speci®cities. All known cleavage sites contain bulky hydrophobic residues (mainly leucine) at the P2 position, glutamine at the P1 position, and small Structure of coronavirus main proteinase reveals combination of a chymotrypsin fold with an extra a-helical domain The EMBO Journal Vol. 21 No. 13 pp. 3213±3224, 2002 ã European Molecular Biology Organization aliphatic residues at the P1¢ position. (iii) Coronavirus main proteinases possess a large C-terminal domain of 110 amino acid residues that is not found in other RNA virus 3C-like proteinases. The characterization of recombinant proteins, in which 33, 28 and 34 C-terminal amino acid residues were deleted from the IBV, MHV and HCoV main proteinases, respectively, resulted consistently in dramatic losses of proteolytic activity, suggesting that the C-terminal domain of M pro contributes to proteolytic activity through unde®ned mechanisms.

The 1.96 A Ê TGEV M pro crystal structure reported herein reveals the structural details of a unique catalytic system and facilitates the interpretation of previously published mutagenesis studies that have, at least in part, remained speculative due to the complete lack of structural information on`3C-like' enzymes.

Structure determination by MAD phasing The presence of 10 methionine residues in the TGEV M pro molecule suggested that selenomethionine-based multi-wavelength anomalous dispersion (MAD; Hendrickson et al., 1990) could be used to solve the phase problem. The unit cell dimensions of the crystals (a = 72.8 A Ê , b = 160.1 A Ê , c = 88.9 A Ê , b = 94.3°, space group P2 1 ) and self-rotation calculations indicated the presence of as many as six TGEV M pro molecules per asymmetric unit. In the MAD phasing process, we ®nally succeeded in locating 48 (out of 60) crystallographically independent selenium sites by the`Shake & Bake' approach to direct methods (Weeks and Miller, 1999) , without recourse to heavy atom derivatives or other methods of phasing (see Materials and methods). The phases obtained resulted in a readily interpretable electron density map. Quality of the model All six copies (designated A±F) of the TGEV M pro in the asymmetric unit of the crystal could be built into well-de®ned electron density (Figure 2) , which covered almost all of the 302 amino acid residues of each monomer. The only exceptions were the two C-terminal residues which were not visible in ®ve of the six chains. Monomers A, E and F also lacked electron density for residue 300. (Thompson et al., 1997) , and corrected manually on the basis of the three-dimensional structure of TGEV M pro . The corresponding sequences of FIPV (strain 79±1146), HCoV (strain 229E), bovine coronavirus (BCoV, isolate LUN), MHV (strain JHM) and IBV (strain Beaudette) were derived from the replicative polyproteins of the respective viruses whose sequences are deposited at the DDBJ/EMBL/GenBank database (accession Nos: FIPV, AF326575; HCoV, X69721; BCoV, AF391542; MHV, M55148; IBV, M95169; TGEV, AJ271965). The b-strands and a-helices as revealed in the TGEV M pro crystal structure (this study) are shown above the sequence alignment (see also Figures 4 and 5). Black background colour indicates the catalytic cysteine and histidine residues. Grey background colour indicates the key residue of the S1 subsite (TGEV M pro His162) and its equivalents in other coronavirus main proteinases. Also shown in grey are the phenylalanine and tyrosine residues (TGEV M pro Phe139 and Tyr160) that are proposed to stabilize the neutral state of His162 (see text for details).

The ®nal model comprises 1798 amino acid residues and 1006 water molecules, as well as 27 sulfate ions, nine dioxane molecules and six 2-methyl-2,4-pentanediol (MPD) molecules from the crystallization medium. The re®nement converged to a ®nal R-factor of 0.210 and an R free (Bru Ènger, 1992) of 0.256, with good stereochemistry. Altogether, 88.4% of the amino acid residues were found in the most favoured regions of the Ramachandran plot, and 10.8% were in additionally allowed regions. Residues Asn70, Asn71 and Ser279 were in regions only generously allowed, but had clear electron density.

The six TGEV M pro monomers present in the asymmetric unit are arranged in three dimers ( Figure 3 ). Each monomer is folded into three domains, the ®rst two of which are antiparallel b-barrels reminiscent of those found in serine proteinases of the chymotrypsin family ( Figure 4 ). Residues 8±100 form domain I, and residues 101±183 make up domain II. The connection to the C-terminal domain III is formed by a long loop comprising residues 184±199. Domain III (residues 200±302) contains a novel arrangement of ®ve a-helices. A deep cleft between domains I and II, lined by hydrophobic residues, constitutes the substrate-binding site. The catalytic site is situated at the centre of the cleft.

The interior of the b-barrel of domain I consists entirely of hydrophobic residues. A short a-helix (helix A; Tyr53±Ser58) closes the barrel like a lid. Domain II is smaller than domain I and also smaller than the homologous domain II of chymotrypsin and hepatitis A virus (HAV) 3C pro (Tsukada and Blow, 1985; Allaire et al., 1994; Bergmann et al., 1997) . Several secondary structure elements of HAV 3C pro (strands b2II and cII and the intervening loop) are missing in the TGEV M pro . Also, the domain II barrel of the TGEV M pro is far from perfect ( Figure 4 ). The segment from Gly135 to Ser146 forms a part of the barrel, even though it consists mostly of consecutive loops and turns. In fact, in contrast to domain I, a structural alignment of domain II has proven dif®cult. The superposition of domains I and II of the TGEV M pro onto those of the HAV 3C pro yields an r.m.s.d. of 1.85 T 0.05 A Ê for 114 equivalent (out of 184 compared) C a pairs, while domain II alone displays an r.m.s.d. of 3.25 T 0.28 A Ê for 57 (out of 85) C a pairs. Domain III is composed of ®ve, mostly antiparallel, a-helices and the loops connecting them. The crossover angles are~90°between helices B and E,~30°between B and D,~20°between C and E, and~80°between E and F, whereas C±B and B±F are parallel to each other (see Figure 5 ). Interhelical contacts are mediated by hydrophobic side chains. The loops between the helices are quite long and ®ll up most of the interstitial space of domain III. Database searches (Holm and Sander, 1993; Gilbert et al., 1999) did not reveal other proteins or protein domains with the same topology as domain III. The N-terminal segment (residues 1±5) of the polypeptide chain folds onto domain III, placing the N-terminus of the protein within 17.0 (T2.7) A Ê of the C-terminus ( Figure 4) .

The six copies of the TGEV M pro in the asymmetric unit of the crystal are highly similar. 

The active site of the coronavirus M pro is similar to those of the picornavirus 3C proteinases, as had been predicted earlier (Gorbalenya et al., 1989b) . The mutual arrangement of the nucleophilic Cys144 and the general acid±base catalyst His41 of TGEV M pro is identical to that of the HAV 3C pro Cys172 and His44 residues and the Ser195 and His57 residues of chymotrypsin. The distance between the sulfur atom of Cys144 and the N e2 of His41 is 4.05 (T0.04) A Ê , i.e. longer than the corresponding cysteine± histidine distances in HAV 3C pro (3.92 A Ê ; Bergmann et al., 1997) , poliovirus (PV) 3C pro (3.4 A Ê ; Mosimann et al., 1997) and papain (3.65 A Ê ; Kamphuis et al., 1984) ( Figure 6B and C). In contrast to papain, but in agreement with the picornavirus 3C proteinases, the sulfur atom is in the plane of the histidine imidazole. There are clear indications from the difference Fourier synthesis ( Figure 6A ) that Cys144 is oxidized, at least to the stage of the sul®nic acid, -SO 2 ± , and probably to the sulfonic acid, -SO 3 ± , in all six copies of TGEV M pro in the crystal. Such oxidation could occur during the time required for crystallization or during X-ray data collection, and would lead to inactivation of the enzyme. Re®nement of the corresponding derivatives was, however, not successful.

It is generally assumed that the native state of the active site of papain-like cysteine proteinases is a thiolate± imidazolium ion pair formed by cysteine and histidine residues (Polga Âr, 1974) . In proteinases of the papain family, an asparagine is the third member of the catalytic triad. Chymotrypsin and other members of this serine proteinase family have a catalytic triad consisting of Ser195...His57...Asp102. In HAV 3C pro , Asp84 is present at the required position, although its side chain points away from His44, making its role disputable (Malcolm, 1995; Bergmann et al., 1997) . PV 3C pro , human rhinovirus (HRV) 3C pro and HRV 2A pro have a glutamate or aspartate in the proper orientation to accept a hydrogen bond from the active site histidine (Matthews et al., 1994; Mosimann et al., 1997; Petersen et al., 1999) . In contrast, TGEV M pro has Val84 in the corresponding position, with its side chain pointing away from the catalytic site ( Figure 6B and C). A buried water molecule is found in the place that normally would be occupied by the side chain of the third member of the catalytic triad. This water molecule makes hydrogen bonds to His41 N d1 , His163 N d1 and Asp186 O d1 ( Figure 6B ). His163 is not conserved among coronavirus main proteinases and its substitution by leucine (M pro -H163L) had no signi®cant effect on the proteolytic activity in the standard peptide assay (see Materials and methods), as compared with the activity of the wild-type M pro (Table I) . Asp186 makes a salt bridge to Arg40 that appears to be required to maintain the active site geometry, since both Asp186 and Arg40 are absolutely conserved among coronaviruses. Through this (and other) interaction(s), the polypeptide segment 184±199, which connects domains II and III and is probably involved in substrate binding (see below), is held in the proper position. Taken together, the data contradict a direct involvement of His163 or Asp186 in catalysis, making the TGEV M pro a clear case of a viral cysteine proteinase employing only a catalytic dyad.

Substrate hydrolysis by cysteine and serine proteinases occurs through a covalent tetrahedral intermediate resulting from attack of the active site nucleophile on the carbonyl carbon of the scissile bond. The developing oxyanion is stabilized by strong hydrogen bonds donated by amide groups of the enzyme. This so-called`oxyanion hole' is also found in TGEV M pro . It is made up by the main chain amides of Gly142, Thr143 and Cys144 ( Figure 6B ).

The speci®city of M pro for a very limited range of amino acids at the P1, P2 and P4 positions resembles the substrate speci®city of picornavirus 3C proteinases (Palmenberg, 1990; Ziebuhr et al., 2000) . This leads us to believe that, similarly to 3C pro (Matthews et al., 1994; Bergmann et al., 1997; Mosimann et al., 1997) , speci®c substrate binding by M pro is ensured by well-de®ned S4, S2 and S1 speci®city pockets. In order to visualize potential interactions with the substrate, we have modelled a pentapeptide representing the P5±P1 residues of a TGEV M pro cleavage site (Asn±Ser±Thr±Leu±Gln, pp1a amino acids 2874±2878; into the substratebinding cleft of M pro (Figure 7) . The model is based on the assumption that M pro binds substrates in a manner analogous to that found in complexes of chymotrypsin-like proteinases with peptide inhibitors. X-ray structures have shown that the P4±P1 residues of peptide inhibitors assume a common main chain conformation when bound to these proteinases, with the P4 and P3 residues adopting a b conformation and the P2 and P1 residues assuming a speci®c main-chain conformation suitable to place their side chains in the pre-formed S1 and S2 speci®city pockets Structure of TGEV main proteinase (James et al., 1980; Fujinaga et al., 1985 , 1987 , Matthews et al., 1999 . These studies lead us to suggest that the residues P5 to P3 of M pro substrates may form an antiparallel b-sheet with segment 164±167 of the long strand eII on one side, and with the segment 186±191 (which links domains II and III) on the other. Hydrogen bonding interactions are likely between the main chain amide and carbonyl oxygen atoms of substrate residues Thr(P3), Ser(P4) and Asn(P5) and the main chain atoms of TGEV M pro residues Glu165, Ser189 and Gly167 (see Figure 7 ).

It has been shown for the HAV, HRV and PV 3C pro enzymes that the imidazole side chain of a conserved histidine, which is located in the centre of a hydrophobic pocket, interacts with the P1 carboxamide side chain of the substrate. This interaction is generally accepted to determine the picornavirus 3C pro speci®city for glutamine at P1 (Matthews et al., 1994 (Matthews et al., , 1999 Bergmann et al., 1997; Mosimann et al., 1997) . Mutational analyses revealed that any replacement of His162 completely abolished the proteolytic activities of the HCoV and feline infectious The sequence of the 15mer substrate peptide, H 2 N-VSVNSTLQSGLRKMA-COOH, was derived from the N-terminal M pro autoprocessing site (residues shown in bold indicate the scissile bond). The activity of wild-type M pro (encompassing 302 residues) was taken as 100% and the mean value of three experiments, which did not vary by more than 15%, is shown. a Proteolytic activities were determined using a peptide-based cleavage assay ; see Materials and methods). peritonitis virus (FIPV) M pro enzymes . The structure shows that the imidazole side chain of His162 is positioned suitably to interact with a P1 glutamine side chain. His162 is located at the very bottom of a hydrophobic pocket which is formed by residues Phe139 and the main-chain atoms of Ile140, Leu164, Glu165 and His171. The side chain of Glu165 forms an ion pair (2.96 T 0.14 A Ê ) with His171. This salt bridge is itself on the periphery of the molecule, forming part of the`outer wall' of the S1 subsite. Accordingly, mutants of the HCoV 229E M pro , in which the residue equivalent to His171 had been replaced by alanine, serine or threonine, retained signi®cant proteolytic activities . In order to interact with the P1 glutamine side chain of the substrate, His162 has to maintain a neutral state over a wide pH range. Most probably, this is achieved by two important interactions: (i) stacking onto the phenyl ring of Phe139, at a distance of 3.53 T 0.18 A Ê ; and (ii) accepting a hydrogen bond from the buried Tyr160 hydroxyl group which has no other hydrogen-bonding partner. The role proposed for the hydroxyl group of Tyr160 is strongly supported by FIPV M pro mutagenesis studies in which the proteolytic activities of Y160F, Y160G, Y160A and Y160T mutants were shown to be dramatically reduced . Tyr160 is part of the absolutely conserved coronavirus M pro sequence signature, 160 Tyr-X-His 162 (Figures 1 and  2) , whereas Gly(Ala)-X-His is found at the equivalent sequence position in most 3C and 3C-like proteinases (Gorbalenya et al., 1989a) . Accordingly, in the 3C and 3Clike proteinases, stabilization of histidine in the neutral tautomeric state has to be ensured by other residues. Notably, in the case of PV 3C pro , this involves a tyrosine residue (Tyr138) which, however, is provided by a different part of the structure (b-strand cII; Mosimann et al., 1997) . For HAV 3C pro , other mechanisms are proposed (Bergmann et al., 1997) .

Halfway down the S1 subsite of TGEV M pro , there is dumbbell-shaped electron density which we have assigned to two water molecules, although theoretically they are too close to one another (2.10 T 0.16 A Ê ). One of them makes a hydrogen bond with N e2 of His162, while the second one, unusually for water, makes no additional contacts. In our model of the substrate complex, these two water molecules mark the position of the carboxamide group of the P1 glutamine side chain.

Coronavirus main proteinases have a strong preference for leucine at the P2 position (Ziebuhr et al., 2000) . The putative S2 subsite identi®ed in the structure is a hydrophobic pocket that is suitably positioned and large enough to accommodate a leucine side chain easily. The S2 pocket is lined by the side chains of Leu164 (the main chain of which forms part of the S1 subsite, see above), Pro188, Ile51, His41 and Thr47 (Figure 7) . In our electron density maps, part of the S2 subsite (of all six copies of the monomer) harbours extra electron density that we interpreted as an MPD molecule from the crystallization medium. In the HAV 3C pro , the corresponding subsite is formed by different parts of the polypeptide chain. It is also smaller and can accommodate the side chains of serine and threonine (Bergmann et al., 1997) .

The quaternary arrangement of the proteinase is a homodimer, with three copies in the asymmetric unit (monomers A and B, C and D, and E and F). All dimers have approximate C 2 symmetry ( Figure 3 ) and~1580 (T199) A Ê 2 of each monomer, i.e. 11±12% of its solventaccessible surface, are buried upon dimerization. The dimer formation is driven mainly by intermolecular interactions between domains II and III of one monomer and the N-terminal residues of the other (see below for Structure of TGEV main proteinase further details). In contrast, the domain III±domain III interface appears to be the consequence rather than the cause of other intermolecular interactions. It involves a relatively small area of 337 T 45 A Ê 2 and comprises only two hydrogen bonds, between the amide group of Gly281 (molecule A) and the main-chain oxygen of Ser279 (molecule B), as well as its symmetry mate, Gly281B...Ser279A (3.22 T 0.37 A Ê , averaged over all six monomers).

Interestingly, the N-terminal residues of each monomer are relatively close to the substrate-binding site of the other monomer in the dimer. The following observations for monomer A hold true for all other monomers. The NH 3 + group of Ser1A, which is the P1¢ residue of the autocleavage reaction of TGEV M pro , is 11.9 T 1.6 A Ê from the active site Cys144B S g of the second molecule in the dimer but as much as 34.2 T 0.9 A Ê away from its own active site cysteine. Ser1A is in contact with residues participating in the substrate-binding site of monomer B. Its NH 3 + group makes a salt bridge (4.99 T 1.04 A Ê ) to the carboxylate of Glu165B (Figure 8 ). This glutamate, which is absolutely conserved among coronaviruses, is part of the S1 subsite (see above), where it also interacts with His171. Although these two side chains form the`wall' of the speci®city site, they have their polar groups oriented towards the surface of the proteinase molecule and away from the substrate's P1 glutamine. An intermolecular ionic interaction between Arg4A and Glu286B (6.0 T 0.7 A Ê ) appears to play a role in positioning the N-terminal residues. Because of the 2-fold non-crystallographic symmetry (NCS), the same interaction occurs between Arg4B and Glu286A. Residues 6A±8A form a short b-strand interacting with strand cII of monomer B (at Val124B). Most of the interactions between the N-terminus of molecule A and the region next to the S1 subsite of molecule B constitute a perfect ®t. Given the fact that the P¢ residues in serine and cysteine proteinases constitute the leaving group of the cleavage reaction and, in coronavirus main proteinases, are not subject to stringent speci®city requirements, it is quite conceivable that, after autoproteolysis, the N-terminus of one monomer slides over the active site of the partner monomer and adopts the position seen in our crystal structure, i.e. with Ser1A interacting with Glu165B at the`outer wall' of the S1 subsite. This, in turn, would suggest that the dimer we are seeing corresponds to the product of the autolysis reaction and that this occurs in trans. Molecular modelling revealed that binding of the M pro N-terminus in the active site cleft of the same molecule would require remodelling of the entire N-terminal segment and beyond (residues 1±13; data not shown), making cleavage in cis less likely. There is additional experimental evidence supporting these conclusions. First, dilution experiments with MHV M pro translated in vitro contradict cis-cleavage activity (Lu et al., 1996) . Secondly, the fact that, early in infection, M pro remains part of a relatively stable 150 kDa precursor protein in which it is¯anked by hydrophobic domains (Schiller et al., 1998) argues against rapid autoprocessing in cis. The proposed model of intermolecular selfprocessing would imply that components of the replication complex could ®rst be anchored to membranes (i.e. the site of RNA replication) in an uncleaved form, and only later, when the precursor proteins accumulate to high local concentrations, will M pro release itself by intermolecular cleavage, thereby triggering the complete spectrum of trans-processing reactions.

A speci®c conformation of the N-terminal segment allows it to`squeeze' residues 1±8 in between domains II and III of the same monomer and domains II and III of monomer B (see above and Figure 8 ). In this context, the N-terminus also interacts with domains II and III of its own protomer. For example, the side-chain amino group of Lys5A makes strong intramolecular hydrogen bonds with Ser110A O g of domain II (2.83 T 0.15 A Ê ), and with the Glu286A main chain oxygen (2.80 T 0.07 A Ê ), as well as with Glu291A O e1 (2.74 T 0.13 A Ê ) of domain III. Furthermore, the side chain of Leu3A completes a hydrophobic patch on domain III which includes Phe206A, Ala209A, Phe287A, Val292A, the C b atom of Gln295A and Met296A; these residues belong to helices B and F. All sequenced members of the coronavirus proteinase family have a hydrophobic residue in position 3, while glycine is absolutely conserved in position 2 (see Figure 1 ). The latter residue adopts the a L conformation which is easily accessible only to glycine. To investigate the functional signi®cance of these interactions, a recombinant protein, M pro D1±5, in which the N-terminal residues Ser1±Lys5 were removed from the M pro sequence, was expressed and tested for proteolytic activity in a trans-cleavage assay using a 15mer peptide representing the N-terminal TGEV M pro autoprocessing site. As shown in Table I , the activity of M pro D1±5 was decreased to only 0.3% of the M pro activity. We conclude from these data that, indeed, residues 1±5 may be critically involved in stabilizing the mutual orientation of domains II and III and thus, indirectly, in maintaining the proper orientation of the intervening loop region (residues 184±199). If this hypothesis is correct, then the deletion of domain III should have similarly detrimental effects on the proteolytic activity and, in fact, the published data (see Introduction) seem to support this conclusion. To corroborate this hypothesis further, an additional set of M pro mutants was characterized in which we used the structural information to remove domain III completely. In this approach, the probability of domain III misfolding, which might have been the cause of M pro inactivation in previous studies using randomly`truncated' coronavirus main proteinases (Lu and Denison, 1997; Ziebuhr et al., 1997; Ng and Liu, 2000) , should be signi®cantly reduced. The TGEV M pro deletion mutants tested for activity comprised (i) domains I and II (M pro D184±302); (ii) domains I and II together with the entire loop region (M pro D200±302); or (iii) domains I and II combined with the loop region but lacking the ®ve N-terminal residues (M pro D1±5/D200± 302). As Table I shows, M pro D200±302 had clearly detectable (albeit signi®cantly reduced) activity (0.4% of M pro ). Similarly, the mutant M pro D1±5/D200±302 had signi®cantly reduced activity (0.6% of M pro ). In sharp contrast, no activities were detectable for M pro D184±302 and the active site mutant, M pro -C144A (the latter being used as a negative control). The fact that residues 184±199 proved to be indispensable for proteolytic activity supports our model of substrate binding (Figure 7) in which residues of the loop are predicted to be critically involved in the formation of a b-sheet-type structure with the substrate (see above). The data also show that an intact N-terminus and the C-terminal domain are required for full activity. The structure suggests that the additional ahelical domain III as well as the N-terminal residues help ®x domains II and the loop 184±199 in a catalytically competent orientation. It will be interesting to investigate whether similar mechanisms are also operating in other 3C-like proteinases with (smaller) C-terminal domains (e.g. arteriviruses and potyviruses; Ziebuhr et al., 2000; .

Beyond its presumed role in proteolytic activity, domain III may have other functions, which remain to be determined. In contrast to picornavirus 3C proteinases for which RNA-binding activities are well established (Andino et al., 1993; Leong et al., 1993; Xiang et al., 1995) , the M pro structure does not support such an activity for the coronavirus main proteinase. Thus, calculation of the electrostatic potential (Nicholls et al., 1991) does not reveal an overall basic character of domain III, nor are there distinct patches of basic or aromatic residues (data not shown). The same applies to domains I and II. Also, the conserved picornavirus sequence motif, KFRDI, located between domains I and II, as well as the small helices and reverse turns that together form the RNAbinding site of HAV 3C pro (Bergmann et al., 1997) are missing in the TGEV M pro structure.

The crystal structure of TGEV M pro shows that coronaviruses have evolved proteinases in which a thiolate± imidazolium catalytic dyad has been combined with a two-b-barrel fold. This framework is extended further by a novel a-helical domain that, together with the N-terminal residues 1±5, appears to be involved in proteolytic activity by maintaining the proper positioning of the presumed substrate-binding loop, 184±199. We are con®dent that the ®rst crystal structure of a non-picornaviral chymotrypsinlike cysteine proteinase will facilitate further molecular modelling of other members of the huge family of RNA viral`3C-like' enzymes for which structural information is still lacking.

Protein puri®cation and crystallization Recombinant TGEV M pro was expressed and puri®ed as previously described for the HCoV and FIPV main proteinases . Brie¯y, the coding sequence of the TGEV M pro was inserted into the XmnI and BamHI sites of pMal-c2 plasmid DNA (New England Biolabs). The resulting plasmid, pMal-M pro , was used to transform Escherichia coli TB1 cells. The maltose-binding protein (MBP)±TGEV M pro fusion protein was puri®ed by amylose±agarose chromatography, cleaved with factor Xa, and the recombinant M pro (residues Ser1±Gln302) was puri®ed by hydrophobic interaction, anion exchange and size exclusion chromatography . The puri®ed and concentrated TGEV M pro (12.5 mg/ml) was stored in 12 mM Tris±HCl pH 7.5, 120 mM NaCl, 1 mM dithiothreitol (DTT), 0.1 mM EDTA. This protein solution was used to crystallize M pro by the hanging drop vapour diffusion method at 4°C. The best crystals, which were of triangular shape and had dimensions of~0.3 Q 0.25 Q 0.3 mm, were obtained by using 100 mM HEPES pH 8.8, 1.8 M ammonium sulfate, 6% MPD, 5 mM DTT and 4% dioxane as the reservoir and grew in~10 days.

The M pro structure could not be solved using conventional molecular replacement techniques. Therefore, selenomethionine (SeMet)-substituted TGEV M pro was produced. The coding sequence of the MBP±TGEV M pro fusion protein was inserted into pET-11d (Novagen), and the resulting plasmid, pET-TGEV-M pro , was used to transform the methionine-auxotrophic 834(DE3) E.coli strain (Novagen), which was propagated in minimal medium containing 40 mg/ml seleno-Lmethionine. The SeMet-substituted TGEV M pro was puri®ed as described above and concentrated to 9.5 mg/ml. Crystals of the SeMet-substituted M pro were grown as decribed for the native protein but using 2 M ammonium sulfate and 8% MPD.

Crystals used for data collection were rinsed with mustard oil and cryocooled in liquid nitrogen. Diffraction data up to 1.95 A Ê resolution were collected from native crystals at 100 K on the X-ray diffraction beamline at ELETTRA (Sincrotrone Trieste, Trieste, Italy), using a Mar165 CCD detector (Table II) . MAD data sets were collected to 2.8 A Ê resolution at four wavelengths using a Mar165 CCD detector on beamline BW7A of the EMBL Outstation at DESY (Hamburg, Germany). SeMet data sets were collected for the f" maximum and f¢ minimum wavelengths. Additional data were collected at remote wavelengths below and above the Se K-edge (Table II) . Data integration and scaling were performed using DENZO and SCALEPACK (Otwinowski and Minor, 1997) .

The unit cell dimensions, as well as the self-rotation function (ALMN; CCP4, 1994) , implied that several monomers were present in the asymmetric unit. A Matthews coef®cient (Matthews, 1968 ) of 2.3 A Ê 3 /Da and a solvent content of 51% were obtained assuming six molecules in the asymmetric unit. The bottleneck of the structure determination was the identi®cation of the 60 selenium positions (six monomers with 10 Se each). Solving the problem by SnB v2.0 (Weeks and Miller, 1999) required data of increased precision, which were obtained by averaging of several data sets and monitoring the process by R pim (Weiss and Hilgenfeld, 1997) . Only after we had combined three merged peakwavelength data sets with two merged edge-wavelength data sets (redundancy = 18) were we able to obtain 105 solutions (from 5000 trials) with signi®cantly reduced minimal function values (R min = 0.49, CC = 0.51; Hauptman, 1991) (details to be published elsewhere). The positions of the best 60 atom solutions from SnB were examined for NCS. In total, 37 positions were found to obey a 2-fold NCS. This symmetry predicted a further 11 positions. All 48 positions were used in MLPHARE (CCP4, 1994) for phasing, followed by solvent¯attening and NCS averaging in DM (Cowtan and Main, 1996) . The resulting electron density maps were of suf®cient quality for chain tracing. The ®rst monomer was built manually into the experimental electron density map, using the program`O' (Jones et al., 1991) . All other monomers were generated by NCS. NCS restraints were applied during the initial stages of re®nement at low resolution and later gradually released as the resolution limit was extended to 1.96 A Ê .

Cycles of adjustments to the model with O and subsequent re®nement using the program CNS (Bru Ènger et al., 1998) converged to an R free of 0.256 and a crystallographic R-factor of 0.210. Data quality and re®nement statistics are given in Table III . The quality of the structural model and its agreement with the structure factors were checked with programs PROCHECK (Laskowski et al., 1993) , WHATCHECK (Vriend, 1990) and SFCHECK (Vaguine et al., 1999) . Solvent accessibility was calculated using the algorithm of Lee and Richards (1971; program NACCESS) , using a solvent probe of radius 1.4 A Ê . The molecular diagrams were drawn using MOLSCRIPT (Kraulis, 1991) and rendered with RASTER 3D (Bacon and Anderson, 1988) . Atomic coordinates and structure factors have been submitted to the RCSB Protein Data Bank under accession code 1LVO. 5.4 (2.9) 3.8 3.8 3.9 3.8 3.9 3.7 3.6 2.9 I/s(I) c 13.5 (4.0) 5.4 4.7 4.8 6.1 4.1 4.1 4.9 2.5 a X-ray diffraction beamline at ELETTRA, Trieste, equipped with a Mar CCD detector. b Wiggler beamline of EMBL at DESY, Hamburg, equipped with a Mar CCD detector. c Highest resolution bin in parentheses. d The in¯ection point and peak wavelengths were collected in inverse beam mode, whereas the remote wavelengths were collected at the low energy side of the Se edge where there is little anomalous signal and, as a result, no inverse beam data were collected. e P1, P2, P3 = peak wavelengths 1, 2 and 3; E1, E2 = edge wavelengths 1 and 2 (point of in¯ection); H1, H2 = high energy remote wavelengths 1 and 2; L1 = low energy remote wavelength. f R merge = 100 Q S i S hkl |I i ± <I>|/S i S hkl I i , where I i is the observed intensity and <I> is the average intensity from multiple measurements. g R rim = 100 Q S i (N/N ± 1) 1/2 S hkl |I i ± <I>|/S i S hkl I i , where N is the number of times a given re¯ection has been measured. This quality indicator corresponds to an R sym that is independent of the redundancy of the measurements. h R pim = 100 Q S i (1/N ± 1) 1/2 S hkl |I i ± <I>|/S i S hkl I i . This factor provides information about the average precision of the data.

For the expression of M pro proteins with N-and C-terminal deletions (M pro D184±302, M pro D200±302, M pro D1±5 and M pro D1±5/D200±302), the corresponding M pro coding sequences were ampli®ed by PCR and inserted into XmnI±BamHI-digested pMal-c2 plasmid DNA. To substitute the M pro residues Cys144 (by Ala) and His163 (by Leu), the corresponding codons were replaced in pMal-M pro by site-directed mutagenesis using a recombination-PCR method (Yao et al., 1992) . The details of the primers used for cloning and mutagenesis and the amino acid sequences of the recombinant proteins expressed and tested for proteolytic activity are given in Table I . The plasmid DNAs were transformed into E.coli TB1 cells and the recombinant proteins were synthesized, af®nity puri®ed and cleaved with factor Xa as described previously . The purity and structural integrity of the mutant proteins were analysed by SDS±PAGE. The control protein for this experiment, wild-type TGEV Mpro pro , was puri®ed in an identical manner. Enzymatic activities of the mutant proteins were measured by using a peptide cleavage assay with a peptide substrate representing the N-terminal TGEV M pro autoprocessing site (H 2 N-VSVNSTLQSGLRKMA-COOH; letters in bold indicate the scissile bond that is cleaved by M pro ). 

Picornaviral 3C cysteine proteinases have a fold similar to chymotrypsin-like serine proteinases

Poliovirus RNA synthesis utilizes an RNP complex formed around the 5¢-end of viral RNA

A fast algorithm for rendering space-®lling molecule pictures

Viral cysteine proteases are homologous to the trypsin-like family of serine proteases: structural and functional implications

The re®ned crystal structure of the 3C gene product from hepatitis A virus: speci®c proteinase activity and RNA recognition

Crystallography and NMR system: a new software suite for macromolecular structure determination

Nidovirales: a new order comprising Coronaviridae and Arteriviridae

The CCP4 suite: programs for protein crystallography

Phase combination and cross validation in iterated density-modi®cation calculations. Acta Crystallogr. D, 52, 43±48. den Boon

Complete sequence (20 kilobases) of the polyprotein-encoding gene 1 of transmissible gastroenteritis virus

Re®ned structure of a-lytic protease at 1.7 A Ê resolution

Crystal and molecular structures of the complex of a-chymotrypsin with its inhibitor turkey ovomucoid third domain at 1.8 A Ê resolution

Cysteine proteases of positive strand RNA viruses and chymotrypsin-like serine proteases. A distinct protein superfamily with a common structural fold

Crystallographic Computing 5, From Chemistry to Biology

Conservation of substrate speci®cities among coronavirus main proteases

Mutational analysis of the active centre of coronavirus 3C-like proteases

Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD): a vehicle for direct determination of three-dimensional structure

Protein structure comparison by alignment of distance matrices

Structure of product and inhibitor complexes of Streptomyces griseus protease A at 1.8 A Ê resolution: a model for serine protease catalysis

Improved methods for building protein models in electron density maps and the location of errors in these models

MOLSCRIPTÐa program to produce both detailed and schematic plots of protein structures

PROCHECK: a program to check the stereochemical quality of protein structures

The interpretation of protein structures: estimation of static accessibility

protease 3C (3C pro ) binds speci®cally to the 5¢-noncoding region of the viral RNA

Characterisation and mutational analysis of an ORF 1a-encoding proteinase domain responsible for proteolytic processing of the infectious bronchitis virus 1a/1b polyprotein

Intracellular and in vitrotranslated 27-kDa proteins contain the 3C-like proteinase activity of the coronavirus MHV-A59

Determinants of mouse hepatitis virus 3C-like proteinase activity

Identi®cation and characterization of a serine-like proteinase of the murine coronavirus MHV-A59

The picornaviral 3C proteinases: cysteine nucleophiles in serine proteinase folds

Solvent content of protein crystals

Structure of human rhinovirus 3C protease reveals a trypsin-like polypeptide fold, RNA-binding site, and means for cleaving precursor polyprotein

Structure-assisted design of mechanismbased irreversible inhibitors of human rhinovirus 3C protease with potent antiviral activity against multiple rhinovirus serotypes

Re®ned X-ray crystallographic structure of the poliovirus 3C gene product

Further characterization of the coronavirus infectious bronchitis virus 3C-like proteinase and determination of a new cleavage site

Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons

Processing of X-ray diffraction data collected in oscillation mode

Complete genome sequence of transmissible gastroenteritis coronavirus PUR46-MAD clone and evolution of the purdue virus cluster

The structure of the 2A proteinase from a common cold virus: a proteinase responsible for the shut-off of host-cell protein synthesis

Mercaptide±imidazolium ion-pair: the reactive nucleophile in papain catalysis

Transmissible gastroenteritis virus

Processing of the coronavirus MHV-JHM polymerase polyprotein: identi®cation of precursors and proteolytic products spanning 400 kilodaltons of ORF1a

Expression and characterization of a recombinant murine coronavirus 3C-like proteinase

The CLUSTAL X windows interface:¯exible strategies for multiple sequence alignment aided by quality analysis tools

Structure of a-chymotrypsin re®ned at 1.68 A Ê resolution

SFCHECK: a uni®ed set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model

WHAT IF: a molecular modeling and drug design program

The design and implementation of SnB version 2.0

On the use of merging R-factor as a quality indicator for X-ray data

Interaction between the 5¢-terminal cloverleaf and 3AB/3CDpro of poliovirus is essential for RNA replication

Site-directed mutagenesis of herpesvirus glycoprotein phosphorylation sites by recombination polymerase chain reaction

Processing of the human coronavirus 229E replicase polyproteins by the virus-encoded 3C-like proteinase: identi®cation of proteolytic products and cleavage sites common to pp1a and pp1ab

Characterization of a human coronavirus (strain 229E) 3C-like proteinase activity

Biosynthesis, puri®cation, and characterization of the human coronavirus 229E 3C-like proteinase

Virus-encoded proteinases and proteolytic processing in the Nidovirales

We thank the staff of ELETTRA (Trieste, Italy) and the EMBL Outstation at DESY (Hamburg, Germany) for help with data collection. Access to these research infrastructures was supported by the European Commission (contract numbers HPRI-CT-1999-00033 and HPRI-CT-1999-00017, respectively). We thank M.S.Weiss and D.Pal for their advice and helpful discussions. This work was supported by grants from the Deutsche Forschungsgemeinschaft awarded to J.Z. (Zi 618/2