key: cord-0706064-fjfc3rto authors: Xue, Xiaoyu; Yang, Haitao; Shen, Wei; Zhao, Qi; Li, Jun; Yang, Kailin; Chen, Cheng; Jin, Yinghua; Bartlam, Mark; Rao, Zihe title: Production of Authentic SARS-CoV M(pro) with Enhanced Activity: Application as a Novel Tag-cleavage Endopeptidase for Protein Overproduction date: 2007-02-23 journal: J Mol Biol DOI: 10.1016/j.jmb.2006.11.073 sha: 31f77aa2bb95af8140123e95400b3531a05d0180 doc_id: 706064 cord_uid: fjfc3rto The viral proteases have proven to be the most selective and useful for removing the fusion tags in fusion protein expression systems. As a key enzyme in the viral life-cycle, the main protease (M(pro)) is most attractive for drug design targeting the SARS coronavirus (SARS-CoV), the etiological agent responsible for the outbreak of severe acute respiratory syndrome (SARS) in 2003. In this study, SARS-CoV M(pro) was used to specifically remove the GST tag in a new fusion protein expression system. We report a new method to produce wild-type (WT) SARS-CoV M(pro) with authentic N and C termini, and compare the activity of WT protease with those of three different types of SARS-CoV M(pro) with additional residues at the N or C terminus. Our results show that additional residues at the N terminus, but not at the C terminus, of M(pro) are detrimental to enzyme activity. To explain this, the crystal structures of WT SARS-CoV M(pro) and its complex with a Michael acceptor inhibitor were determined to 1.6 Å and 1.95 Å resolution respectively. These crystal structures reveal that the first residue of this protease is important for sustaining the substrate-binding pocket and inhibitor binding. This study suggests that SARS-CoV M(pro) could serve as a new tag-cleavage endopeptidase for protein overproduction, and the WT SARS-CoV M(pro) is more appropriate for mechanistic characterization and inhibitor design. In the age of proteomics, the production of pure protein in a high-throughput manner is required for both structural and functional studies. Especially for protein structure studies, one of the main bottlenecks is to produce adequate quantities of soluble and properly folded recombinant proteins. 1 Fusion protein expression systems have been widely used for this purpose in basic research and in industry. Fusion domains (or small "tags") are expressed as partners of the passenger proteins and are generally removed after purification, as they might interfere with the function or other biochemical or biophysical characteristics of the protein. 2 Hence, some endopeptidases, such as bovine thrombin, bovine Factor Xa, human rhinovirus 3C protease and tobacco etch virus (TEV) protease, are routinely used to remove fusion domains. Of these, the viral proteases, e.g. human rhinovirus 3C protease 3, 4 and TEV protease, 5, 6 have proven to be the most selective and useful to date. Therefore, the development of novel viral † X.X. and H.Y. made equal contributions to this work. Abbreviations used: SARS, severe acute respiratory syndrome; SARS-CoV, SARS coronavirus; M pro , main protease; GST, glutathione-S-transferase; WT, wild-type enzyme with authentic N and C termini; GS-WT, wild-type enzyme with two additional amino acid residues (GS) at the N terminus; GPLGS-WT, wild-type enzyme with five additional amino acid residues (GPLGS) at the N terminus; WT-GPH 6 , wild-type enzyme with eight additional amino acid residues (GPHHHHHH) at the C terminus. E-mail address of the corresponding author: raozh@xtal.tsinghua.edu.cn proteases with high specificity and activity will be helpful to proteomics research. The SARS coronavirus (SARS-CoV) is the etiological agent responsible for the global outbreak of a life-threatening disease that caused approximately 800 deaths worldwide. [7] [8] [9] [10] [11] The coronavirus main protease (M pro ), which plays a key role in mediating viral replication and transcription, has been identified as the most attractive target for anti-SARS drug design. [12] [13] [14] In 2003, our group published the first crystal structure of SARS-CoV M pro . 14 Thereafter, several other groups published the structures of M pro , [15] [16] [17] and its complex with an aza-peptide epoxide inhibitor. 18 The M pro can form a homodimer both in the crystal 14, 17, 18 and in solution. 19, 20 Each protomer consists of three domains: domains I and II resemble chymotrypsin, whereas domain III has a globular cluster of five, mostly antiparallel, αhelices. SARS-CoV M pro has a catalytic dyad consisting of His41 and Cys145, and its substrate-binding pocket is located in the cleft between domains I and II. The N-terminal residues 1-7 of domain I (or N-finger) of M pro are considered to have an important role in its proteolytic activity, 14, [21] [22] [23] although their importance in dimerization has been reported with inconsistent results. [21] [22] [23] In particular, mutation of Arg4, which is involved in forming an ion pair, results in a fourfold decrease in activity. 21 Deletion of the N-terminal residues 1-7 results in an almost completely inactive SARS-CoV M pro . 23 SARS-CoV M pro has been characterized sufficiently and two particular aspects drew our attention. First, it is highly selective for substrate sequence. 24 Second, it displays a high level of proteolytic activity, although inconsistent kinetic parameters have been reported for SARS-CoV M pro with k cat /K m ranging from 20-29,000 M −1 s −1 . 19, 20, [25] [26] [27] These properties indicate that SARS-CoV M pro is a promising candidate as an endopeptidase to remove fusion tags. In this study, SARS-CoV M pro was utilized to remove the fusion domain in overproduction of calbindin D28k (a member of the calmodulin superfamily) from a novel glutathione-S-transferase (GST) fusion protein expression system. [28] [29] [30] [31] To obtain the peptidase with high catalytic efficiency, we developed a new method to produce wild-type (WT) SARS-CoV M pro with authentic N and C termini, and investigated the activity difference among several SARS-CoV M pro constructs. The crystal structures of WT M pro and its complex with an inhibitor were solved to explain the mechanism of enhanced activity. In order to take advantage of SARS-CoV M pro as an endopeptidase, a new GST fusion protein expression vector (designated pGSTM) was con-structed from another fusion vector pGEX-6p-1 (GE Healthcare). The resulting plasmid retains the whole multiple cloning sites of pGEX-6p-1. However, the linker between the GST gene and the gene of interest was replaced by nucleotides encoding a unique Nterminal autocleavage site of SARS-CoV M pro consisting of 11 amino acid residues (see Figure 1 (a)). Sequencing confirmed that the cleavage site (TSAVLQSGFRK) was inserted correctly. Expression and purification of calbindin D28k using the pGSTM expression system Calbindin D28k, a member of the calmodulin superfamily, has been proposed to function as an important intracellular Ca 2+ -buffering protein. 31 We took it as our protein of interest for structural studies. This protein was cloned, expressed and purified using our new protein expression system. After one-step purification, calbindin D28k protein, with five additional residues (SGFRK-) at the N terminus and an authentic C terminus, was analyzed by SDS-PAGE and the purity was shown to be >90% (Figure 1(b) ). Around 70-80 mg of calbindin D28k was obtained from 1 l of bacterial culture. The crystallographic analysis of calbindin D28k is underway. In an earlier study, wild-type SARS-CoV M pro with two additional residues (GS) at the N terminus (GS-WT) was used for the determination of kinetic parameters. 12 To avoid any potential effects from the extraneous N-terminal residues on enzyme activity, we developed a new method to produce WT SARS-CoV M pro . First, the four amino acids AVLQ, which correspond to the P4-P1 sites of the N-terminal autocleavage sequence of SARS-CoV M pro , were introduced between the GST tag and the first residue of the protease. Thus, the authentic N terminus would become available by autocleavage during protein expression. Second, the eight amino acids GPHHHHHH (abbreviated as GPH 6 , where GP correspond to the P1′ and P2′ sites of rhinovirus 3C protease; see Table 1 ) were added after the last glutamine residue at the C terminus (Figure 1(c) ). This strategy was intended to yield an authentic C terminus following cleavage by rhinovirus 3C protease, despite appearing unreasonable, since SARS-CoV and rhinovirus belong to different families (Coronaviridae and Picornaviridae, respectively). This strategy was used for the following reasons. With an extra domain III, SARS-CoV M pro differs from rhinovirus 3C protease in many aspects. However, they share several similarities in both cleavage sequence and the substrate-recognition pocket. From the native/complex structures of these two proteases published to date, the S4, S2, S1, S1′ and S2′ subsites are critical for substrate binding to the two proteases. 12, 14, 18, 32, 33 Thus, the corresponding P4, P2, P1, P1′ and P2′ sites were taken into consideration when designing recognition sequences. Table 1 lists the C-terminal autocleavage site for SARS-CoV M pro , and the substrate sequence for human rhinovirus 3C protease. It shows that the five sites are identical for the two proteases, with the exception of P2′. The substantial difference embodied in the two substrate sequences is limited to P2′. Hence, we reasoned that rhinovirus 3C protease could recognize the octapeptide substrate SGVTFQ↓GP, whose P1-P6 sites were derived from the C-terminal autoprocessing site of SARS-CoV M pro . However, replacing lysine with proline at the P2′ site would arrest autocleavage. Our group has recently solved the crystal structure of a mutant of SARS-CoV M pro in complex with an 11 amino acid residue peptidyl substrate (unpublished results). The complex structure revealed that substitution of with Pro would result in steric hindrance between P2′ (Pro) and the main chain of SARS-CoV, hindering substrate binding. On the basis of the above analysis, we hypothesized that rhinovirus 3C protease but not SARS-CoV M pro could recognize the octapeptide substrate, SGVTFQ↓GP. SDS-PAGE analysis (see Figure 1 (d)) showed that after Ni-affinity chromatography, the SARS-CoV M pro is in the WT-GPH 6 form, implying that the Nterminal GST tag was removed efficiently by autocleavage during protein expression in Escherichia coli. The difference between lane 4 and lane 5 indicates that the GPH 6 tag was indeed removed by rhinovirus 3C protease cleavage, confirming that the modified substrate of SARS-CoV M pro could be recognized by rhinovirus 3C protease. Matrix-assisted laser desorption/ionization timeof-flight mass spectrometry (MALDI-TOF MS) analysis indicated that WT-GPH 6 and WT M pro s were in accord with their predicted molecular mass (see Table 2 ). To ascertain whether the additional residues would interfere with the catalytic activity of SARS-CoV M pro , we determined the kinetic parameters for GPLGS-WT (with five additional residues (GPLGS) at the N terminus of WT), GS-WT, WT-GPH 6 and WT SARS-CoV M pro s (see Table 3 ). The catalytic efficiency of an enzyme is best defined by k cat /K m . 34 Table 3 shows that the WT protease has the highest cleavage efficiency (k cat /K m = 26,500 M −1 s −1 ) while GPLGS-WT has the lowest (k cat /K m = 167 M −1 s −1 ). The activity of the WT protease with authentic N and C termini was more than 150-fold greater than that of GPLGS-WT and 20-fold greater than that of GS-WT. Furthermore, our results suggest that increasing the number of additional residues at the N terminus would result in a greater decrease in activity. However, the activity of WT-GPH 6 was about one-third of that of the WT enzyme, which suggests that additional residues at the C terminus had less effect on activity. In order to shed light on the difference in activity, we determined the crystal structure of the WT SARS-CoV M pro to 1.6 Å resolution. In our published crystal structure of SARS-CoV M pro (in the GPLGS-WT form), the crystal belongs to the space group P2 1 and the asymmetric unit contains a dimer (with the two protomers designated A and B). Within the dimer, protomer A was in the active form, while protomer B showed an inactive form, resulting from a partially collapsed S1 subsite. 14 In contrast, the crystal of WT SARS-CoV M pro belongs to space group C2, and each asymmetric unit contains only one protomer of a typical dimer (with the two protomers designated A* and B*). The two protomers in the dimer are related by a crystallographic 2-fold symmetry axis, and each has a catalytically competent conformation. All residues of the protomer (residues 1-306) were identified from electron density maps. Apart from differences in the S1 subsite, the SARS-CoV M pro protomer as seen in the new structure is very similar to each member of the dimer in the original structure (see Figure 2 ). Compared to the latter structure, the protomer in the new crystal forms display overall rmsd for C α atoms of 0.6 Å for protomer A and 0.7 Å for protomer B. In the following discussion, we consider the differences between the substrate-binding sites of the two structures in further detail. The S1 site of SARS-CoV M pro , which has absolute specificity for Gln in the P1 site, consists of the sidechains of His163 and Phe140, and the main-chain atoms of GluA166, Asn142, Gly143 and HisA172. In the S1 site of protomer A in our original structure (GPLGS-WT), the NH group of SerB1 was unable to donate hydrogen bonds simultaneously to the carboxylate group of GluA166 and the main-chain carbonyl group of PheA140, due to the presence of additional residues at the N terminus, although the distance is suitable for hydrogen bond formation. However, in the newly solved WT structure, the amino group (NH 2 ) of SerB*1 in one protomer donates a 3.0 Å hydrogen bond to the carboxylate group of GluA*166 (Figure 3(a) ) and a 2.7 Å hydrogen bond to the main-chain carbonyl group of PheA*140, thus stabilizing the S1 pocket. This induces a series of conformational changes. For instance, the NH of GlyA*143, which participates directly in the formation of the oxyanion hole, moves by 0.8 Å towards the active site; the main chain of residues 142-143 moves towards the S1 subsite; and the side-chain of AsnA*142 flips over with a 6 Å shift. Stabilized by SerB*1, protomer A* displays a more catalytically competent conformation in the S1 subsite than protomer A in our original structure. In our original structure, the S1 pocket of protomer B is partly collapsed compared with the WT protomer: no electron density was visible for residues A1 and A2; GluB166 reorientates to interact with the possibly protonated HisB163; PheB140 undergoes a dramatic conformational change, with the phenyl ring moving by as much as 10 Å; and GlyB143 moves about 3 Å towards the active site, Figure 3 . Superposition of the S1 pockets of GPLGS-WT and WT SARS-CoV M pro (in stereo). (a) Superposition of the S1 pockets in protomer A of GPLGS-WT and that of protomer A* of WT SARS-CoV M pro . Protomer A* of WT is in blue; protomer A of GPLGS-WT is in yellow; protomer B* of WT is in magenta; protomer B of GPLGS-WT is in red. In the WT structure, the amino group (NH 2 ) of Ser1 in protomer B* donates a 3.0 Å hydrogen bond to the carboxylate group of Glu166 and a 2.7 Å hydrogen bond to the main-chain carbonyl group of Phe140 in protomer A*, stabilizing the S1 pocket. The NH of Gly143 moves 0.8 Å toward the activity site; the main chain of residues 142-143 moves toward the S1 subsite; the side-chain of Asn-A*142 flips over with a 6 Å shift compared with protomer A of GPLGS-WT. (b) Superposition of the S1 pockets in protomer B of GPLGS-WT and that of Protomer A* of WT SARS-CoV M pro . Protomer A* of WT is in blue; protomer B of GPLGS-WT is in yellow; protomer B* of WT is in magenta; protomer A of GPLGS-WT is in red. The S1 pocket of protomer B collapses partly with reorientation of Glu166 and residues 140-143. No electron density was visible for residues A1 and A2. leaving no space to accommodate a tetrahedral reaction intermediate (Figure 3(b) ). These structural variations account for the higher activity of the WT protease compared with the GPLGS-WT and GS-WT proteases. We observed also that the GPLGS-WT protease has lower activity than the GS-WT protease. This might result from the additional flexible residues at the N terminus, which are located close to the active site and would hinder substrate binding. In contrast to the N terminus, the C terminus of the WT protomer (residues 301-306) is located far from the substrate-binding pocket of its partner protomer (∼10 Å). Therefore, the additional residues (GPH 6 ) at the C terminus are expected to have less effect on enzyme activity (see Table 3 ). In our previous study, we designed an irreversible anti-coronavirus inhibitor (designated as N3, Figure 4 (a)) consisting of an α,β-unsaturated ester (one type of Michael acceptor) incorporated with a peptidyl portion. 12 The evaluation of this series of time-dependent inhibitors requires a pseudo secondorder rate constant (k 3 /K i ). K i and k 3 represent the equilibrium binding constant and inactivation rate constant for covalent bond formation, respectively. 12 We assayed the inhibition of N3 against different types of M pro s to determine whether or not the additional residues would affect inhibitor binding. In our preliminary inhibition assays, we observed that N3 could completely inactivate WT and WT-GPH 6 proteases after its preincubation with the proteases (fivefold molar excess of the enzyme) for 5 min, but not GPLGS-WT and GS-WT proteases (see Supplementary Data Figure S2 ). The strict kinetic parameters listed in Table 4 show that the second-order rate constant of N3 against WT protease (k 3 /K i = 18,800 M −1 s −1 ) is approximately equal to that against WT-GPH 6 (k 3 /K i = 15,400 M −1 s −1 ). However, the second-order rate constant In the WT-N3 complex structure, the NH 2 group of Ser1 in protomer B* was still hydrogen-bonded to the carboxylate group of Glu166 and the carbonyl group of Phe140 in protomer A*, stabilizing the S1 pocket. In the GPLGS-WT-N3 complex structure, however, the two hydrogen bonds described above were not found. Instead, an ordered water molecule was observed in the S1 pocket. Protomer A* of WT is in blue; protomer A of GPLGS-WT is in yellow; inhibitor N3 (complexed with WT) is in magenta; inhibitor N3 (complexed with GPLGS-WT) is in red; protomer B* of WT is in green; protomer B of GPLGS-WT is in cyan. of N3 (k 3 /K i = 340 M −1 s −1 ) against GS-WT 12 is decreased by greater than 50-fold. This difference implies that the first residue of SARS-CoV M pro also plays an important role in inhibitor binding. We determined the crystal structure of WT SARS-CoV M pro in complex with inhibitor N3 to 1.9 Å resolution. The substantial difference between the structures of the WT complex and the GPLGS-WT complex reported previously 12 still lies in the S1 subsite. In the GPLGS-WT complex structure, N3 binds to protomers A and B of SARS-CoV M pro in an identical and normal manner, 12 thus we discuss only protomer A and A*. In protomer A of GPLGS-WT M pro complexed with N3, the NH group of SerB1 was 3.4 Å from the carboxylate group of GluA166 and 4.7 Å from the main-chain carbonyl group of PheA140, both of which are beyond the distance for formation of a hydrogen bond. However, an ordered water molecule is situated at the bottom of the S1 subsite, connecting N3 and the protease. This water donates two hydrogen bonds, to the carboxylate group of GluA166 and the main-chain carbonyl group of PheA140, and accepts two hydrogen bonds from the lactam and the sidechain of HisA172. Although the water molecule helps to stabilize the inhibitor binding in the S1 pocket, it occupies part of the space of the S1 subsite. Due to steric hindrance, the lactam was not able to insert further into the S1 subsite. In protomer A* of the WT M pro complexed with N3, no water molecule was found at the bottom of the S1 pocket. As a consequence, the carboxylate group of GluA*166 moves 1.7 Å upwards to form a 2.8 Å hydrogen bond with the NH of the lactam. The competent conformation of the S1 subsite is still maintained via the interaction of NH 2 SerB*1 with the carboxylate group of GluA*166 (Figure 4(b) ) and the main-chain carbonyl group of PheA*140. These structural data demonstrate that the first Ser1 residue is important also for inhibitor binding, and accounts for the more potent inhibition of N3 against the WT than the GPLGS-WT and GS-WT proteases. Although SARS-CoV is notorious for causing a lethal disease in humans, some positive elements may result from this life-threatening virus. In this study, SARS-CoV M pro was engineered to serve as a novel endopeptidase to remove fusion tags in recombinant protein overproduction. Table 5 shows the advantages of SARS-CoV M pro compared with other routinely used proteases in methods for production, substrate specificity and cleavage efficiency. Human thrombin and bovine factor Xa are both extracted from plasma, which requires more complicated procedures, although they possess a high level of cleavage efficiency. In addition, they are not as selective for substrate as viral proteases. WT SARS-CoV M pro , which can be suitably overexpressed in E. coli, is highly specific for substrate. Furthermore, it has superior activity (k cat /K m = 26500 M −1 s −1 ) to rhinovirus and TEV proteases. Therefore, SARS-CoV M pro is a suitable candidate for site-specific cleavage in fusion protein expression systems. The cleavage efficiency of WT-GPH 6 is still high, despite additional residues at the C terminus. Highly purified M pro in this form could be obtained by a simple one-step purification, as shown by the overloaded SDS-PAGE gel (Figure 1(d) ) For the sake of an additional six histidine residues at the C terminus, it can be readily separated from the protein of interest by affinity chromatography after removal of the fusion tags. These advantages suggest this form of SARS-CoV M pro could have important industrial applications. One disadvantage for recombinant protein expression is to commonly produce additional amino acid residues at the termini of the wild type protein. In SARS-CoV M pro studies, several research groups have reported inconsistent results for the kinetic parameters of this protease, with k cat /K m ranging from 20-29,000 M −1 s −1 . 19, 20, [25] [26] [27] The slight differences in methods (HPLC and FRET-based methods) and substrates used would not account entirely for this phenomenon. The first published crystal structure of SARS-CoV M pro (with additional residues at the N terminus) by our group provided some clues that the first residue might play an important role in substrate binding. In order to clarify this point, we designed a new strategy to produce the WT SARS-CoV M pro . We created a GST fusion product with a tag that can be removed via Table 4 . Enzyme inhibition data of inhibitor N3 against four types of SARS-CoV M pro 9.0 ± 0.8 0.0031 ± 0.0005 340 ± 27 WT-GPH 6 2.3 ± 0.1 0.034 ± 0.001 15,400 ± 1,200 WT 1.9 ± 0.1 0.035 ± 0.002 18,800 ± 1,800 the autocleavage mechanism of this enzyme for three reasons: (1) autoprocessing will not produce additional residues at the N terminus; (2) SARS-CoV M pro has been reported to have highly efficient expression in GST fusion systems; and (3) this method could be used to characterize the autocleavage efficiency of SARS-CoV in vitro. As for point (3), the affinity GST tag was a mimic of the transmembrane domain upstream of SARS-CoV M pro in polyprotein 1a and 1ab, representing an autocleavage model of the M pro in vitro. SDS-PAGE analysis shows that the GST tag was removed entirely from the N terminus through autoprocessing during expression of SARS-CoV M pro in E. coli, exhibiting its high efficiency for autocleavage in vitro. In previous studies, it was hypothesized that the autocleavage of M pro may occur in proximity to the membrane. 13 Our data suggest that completion of autocleavage might be achieved in the cytoplasm, although the precise whereabouts of the assembly of replicase complex components remains to be identified. According to separate reports by Hsu and Lin, 35,36 using autocleavage to remove tags such as thioredoxin can further support our hypothesis. Rhinoviruses and coronaviruses belong to the Picornaviridae and the Coronaviridae, respectively. No previous report has shown that any CoV M pro could efficiently process the substrate of picornavirus 3C proteases, or vice versa. In our study, it is interesting to observe that only substitution at the P2′ site resulted in the conversion of the SARS-CoV main protease substrate into that of rhinovirus 3C protease. In our study, we demonstrated that the first Ser1 residue at the N terminus of SARS-CoV M pro is important for its activity and inhibitor binding. The critical interactions involved in stabilizing the substrate-binding pocket are two hydrogen bonds formed by the free amino group of the first residue of one protomer with two residues, Phe140 and Glu166, comprising the S1 subsite of its partner protomer within the dimer structure. The stabilizing effect of the free amino group is obvious. In our published GPLGS-WT structure, the S1 subsite of protomer B is partially collapsed without stabilization by the first residue of protomer A, suggesting that the S1 pocket would be less stable due to micro-environmental changes. During the crystallization step of the structure determination procedure, a large quantity of WT SARS-CoV M pro crystals could be produced over one night, which could easily diffract to very high resolution under the conditions described previously. In contrast, it usually took five to ten days to produce a very limited amount of GPLGS-WT SARS-CoV M pro or GS-WT SARS-CoV M pro crystals suitable for diffraction, albeit with comparatively lower resolution. As a consequence, the WT SARS-CoV M pro reported here will be more suitable for activity assays, inhibitor screening and crystallization. It should accelerate development of anti-coronavirus inhibitors through a structure-assisted approach to drug design. The 654-965 region of the pGEX-6p-1 vector was cloned into the pMD18-T vector with two primers: forward, 5′-TTCGAAGATCGTTTATGTCATAAA-3′ reverse, 5′-GGATCCTTTCCTAAAACCACTCTGCAGA-ACTGCACTAGTATCCGATTTTGGAGGATG-3′ The 33 nucleotides encoding the specific 11 amino acids TSAVLQSGFRK recognized by SARS-CoV M pro were introduced by the reverse primer. The recombinant pMD18-T plasmid was double-digested with BstbI and BamHI. The segment of interest was recombined into the pGEX-6p-1 vector, which had also been double-digested by BstbI and BamHI, to construct the new protein expression vector pGSTM. Cloning, expression and purification of calbindin D28k by the pGSTM protein expression system The gene encoding calbindin D28k was amplified by the polymerase chain reaction (PCR) by primers: The PCR products were ligated to pMD18-T vectors with bacteriophage T4 DNA ligase. After digestion by BamHI and XhoI, the gene of interest was inserted between the BamHI and XhoI sites of the pGSTM vector. The resulting recombinant plasmid was transformed into the E. coli strain BL21 (DE3). The cells were cultured in LB medium containing 0.1 mg ml −1 ampicillin. When the absorbance at 600 nm (A 600 ) reached 0.6, IPTG was added to 0.5 mM and the cell culture was incubated at 16°C for 10 h. After harvesting by centrifugation at 4600g (Beckman JLA-10-5), the pellet was resuspended in PBS (140 mM NaCl, 2.7 mM KCl, 10 mM Na 2 HPO 4 , 1.8 mM KH 2 PO 4 , pH 7.3) and sonicated on ice. The lysate was centrifuged at 27,000g (Beckman JA-25-50) for 30 minutes and the precipitate was discarded. The supernatant was loaded onto 2 ml GST-glutathione affinity columns (Pharmacia) equilibrated with PBS and washed with 30 column volumes of PBS. After that, 0.1 mg of SARS-CoV M pro (GS-WT) was added to the column at 4°C for 12 h to remove the GST tag. The protein of interest was collected and analyzed by SDS-PAGE. Expression and purification of SARS-CoV M pro with different numbers of additional residues (5, 2, 0) at the N terminus is described below. SARS-CoV M pro (with five additional residues, GPLGS, at the N terminus) Expression and purification of GPLGS-WT SARS-CoV M pro has been reported. 14 Briefly, the coding sequence of the SARS-CoV M pro was inserted into the BamHI and XhoI sites of pGEX-6p-1 plasmid DNA (GE Healthcare). The resulting plasmid was used to transform E. coli BL21 (DE3) cells. The GST fusion protein, GST-SARS-CoV M pro , was purified by GST-glutathione affinity chromatography, cleaved with GST rhinovirus 3C protease, and the recombinant SARS-CoV M pro was further purified by anion-exchange chromatography. SARS-CoV M pro (with two additional residues, GS, at the N terminus) Expression and purification of GS-WT SARS-CoV M pro has been reported. 12 Briefly, the coding sequence was inserted into the BamHI and XhoI sites of the pGEX-4T-1 vector (GE Healthcare). The following procedure is similar to that used for the expression and purification of SARS-CoV M pro in pGEX-6p-1 plasmid, except that the GST fusion protein was cleaved by thrombin. WT SARS-CoV M pro (without additional residues at the termini) The coding sequence for SARS-CoV M pro was amplified by polymerase chain reaction (PCR) using the PCR primers: The 12 nucleotides coding for the four amino acids AVLQ (corresponding to the P1-P4 autocleavage sites at the N terminus of SARS-CoV M pro ; nomenclature for the substrate amino acid residues is Pn, … , P2, P1, P1′, P2′, … , Pn', where P1-P1′ denotes the hydrolyzed bond while Sn, … , S2, S1, Sl′, S2′, … , Sn' denote the corresponding enzyme binding sites 37 ) were added before the first Ser1 residue. The 24 nucleotides coding for the eight amino acids GPH 6 were added at the C terminus by the reverse primer. The PCR products were inserted into the BamHI and XhoI sites of the pGEX-6p-1 plasmid (GE Healthcare). The resulting plasmid was then used to transform E. coli BL21 (DE3) cells. The sequence of the insert was verified by dideoxynucleotide sequencing. Positive clones harboring the recombinant plasmid were grown to an A 600 of 0.6 at 37°C by shaking in LB medium containing 0.1 mg ml −1 ampicillin. The GST fusion protein was expressed by introducing IPTG to 0.5 mM with incubation continued at 16°C for 10 h. Cells were then harvested by centrifugation at 4600g (Beckman JLA-10-5), resuspended in lysis buffer (20 mM Tris-HCl (pH 8.0), 300 mM NaCl) and sonicated on ice. The lysate was centrifuged at 27,000g (Beckman JA-25-50) for 30 min and the supernatant was collected. The His tag fused protein was purified by Ni-NTA affinity chromatography and concentrated in PreScission Cleavage Buffer (50 mM Tris-HCl (pH 7.0), 150 mM NaCl, 1 mM EDTA, 1 mM dithiothreitol). 50 μl of 3 mg ml −1 human rhinovirus 3C protease was added to 1 ml of the above 10 mg ml −1 WT-GPH 6 solution to cleave the C-terminal His tag, producing an SARS-CoV M pro with an authentic C terminus. The WT SARS-CoV M pro was further purified using anionexchange chromatography. The protein samples in each step were prepared for SDS-PAGE analysis. WT-GPH 6 and WT were analyzed by MALDI-TOF MS. The purified and concentrated WT SARS-CoV M pro (10 mg ml −1 ) was stored in 50 mM Tris-HCl (pH7.3), 1 mM EDTA at −80°C for enzyme activity assays and crystallization. Enzyme activity assays of GS-WT SARS-CoV M pro have been described. 12 Activity assays of GPLGS-WT, WT-GPH 6 and WT SARS-CoV M pro followed a similar protocol. Briefly, the substrate of the N and C-terminal authentic SARS-CoV M pro was the fluorogenic compound MCA-AVLQSGFR-Lys(Dnp)-Lys-NH2 (greater than 95% purity, GL Biochem Shanghai Ltd, Shanghai, China). The excitation and emission wavelengths of the fluorogenic substrate were 320 nm and 405 nm, respectively. A buffer consisting of 50 mM Tris-HCl (pH 7.3), 1 mM EDTA was used for enzyme activity assays at a temperature of 30°C. The reaction was initiated by adding protease (final concentration of 0.2 μM for WT and WT-GPH 6 , 2 μM for GPLGS-WT) to a solution containing different final concentrations of the substrate (3.2-40 μM for WT and WT-GPH 6 , and 6.4-80 μM for GPLGS-WT). The kinetic constants K m and k cat were obtained from a doublereciprocal plot (Supplementary Data Figure S1 ). Strict a R merge = ∑ | I i - | /∑ | I | , where I i is the intensity of an individual reflection i, and is the average intensity of that reflection. b R work = ∑ | F p| -| F c| /∑ | F p| , where F c is the calculated and F p is the observed structure factor amplitude. c Ramachandran plots were generated with the program PROCHECK. kinetic parameters were determined for the inhibition assay. 12 Crystallization, data collection and structure determination Crystallization of WT SARS-CoV M pro was carried out as described. 14 The preparation of the co-crystals of SARS-CoV M pro in complex with the inhibitor N3 has been reported. 12 A set of WT SARS-CoV M pro data was collected from a single crystal on beamline BL19-ID of the Advanced Photon Source (APS), Argonne National Lab at a wavelength of 1.00 Å. Data for the SARS-CoV M pro complex were collected at 100 K in-house on a Rigaku CuKα rotating-anode X-ray generator (MM007) at 40 kV and 20 mA (1.5418 Å) with a Rigaku R-AXIS IV++ image-plate detector. Data were processed, integrated, scaled and merged using HKL2000. 38 The methods used for structure determination were as described. 12 Briefly, the structures were determined by molecular replacement from our native structure of SARS-CoV M pro (pH 7.6) (PDB ID 1UK3). Data collection and structure refinement statistics are summarized in Table 6 . Coordinates and structure factors for WT SARS-CoV M pro have been deposited in the Protein Data Bank with accession number 2H2Z. Coordinates and structure factors for WT SARS-CoV M pro in complex with the inhibitor N3 have been deposited in the Protein Data Bank with accession number 2HOB. Protein production: feeding the crystallographers and NMR spectroscopists Differential effects of short affinity tags on the crystallization of Pyrococcus furiosus maltodextrin-binding protein Efficient and rapid affinity purification of proteins using recombinant fusion proteases A continuous colorimetric assay for rhinovirus-14 3C protease using peptide p-nitroanilides as substrates A new vector for highthroughput, ligation-independent cloning encoding a tobacco etch virus protease cleavage site Tobacco etch virus protease: mechanism of autolysis and rational design of stable mutants with wild-type catalytic proficiency Identification of a novel coronavirus in patients with severe acute respiratory syndrome A novel coronavirus associated with severe acute respiratory syndrome Newly discovered coronavirus as the primary cause of severe acute respiratory syndrome Coronavirus as a possible cause of severe acute respiratory syndrome Aetiology: Koch's postulates fulfilled for SARS virus Design of wide-spectrum inhibitors targeting coronavirus main proteases Coronavirus main proteinase (3CLpro) structure: basis for design of anti-SARS drugs The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor Structure of the SARS coronavirus main proteinase as an active C2 crystallographic dimer pH-dependent conformational flexibility of the SARS-CoV main proteinase (M(pro)) dimer: molecular dynamics simulations and multiple X-ray structure analyses Understanding the maturation process and inhibitor design of SARS-CoV 3CLpro from the crystal structure of C145A in a product-bound form Crystal structures of the main peptidase from the SARS coronavirus inhibited by a substrate-like aza-peptide epoxide Identification of novel inhibitors of the SARS coronavirus main protease 3CLpro Biosynthesis, purification, and substrate specificity of severe acute respiratory syndrome coronavirus 3C-like proteinase Quaternary structure of the severe acute respiratory syndrome (SARS) coronavirus main protease Critical assessment of important regions in the subunit association and catalytic action of the severe acute respiratory syndrome coronavirus main protease Severe acute respiratory syndrome coronavirus 3C-like proteinase N terminus is indispensable for proteolytic activity but not for enzyme dimerization. Biochemical and thermodynamic investigation in conjunction with molecular dynamics simulations The coronavirus replicase 3C-like proteinase from SARS coronavirus catalyzes substrate hydrolysis by a general base mechanism Characterization of SARS-CoV main protease and identification of biologically active small molecule inhibitors using a continuous fluorescence-based assay High-throughput screening identifies inhibitors of the SARS coronavirus main proteinase Distribution of parvalbumin-, calretinin-, and calbindin-D28k-immunoreactive neurons and fibers in the human entorhinal cortex Distribution of calretinin, calbindin-D28k and parvalbumin in the hypothalamus of the squirrel monkey Distribution of calretinin, calbindin D28k, and parvalbumin in subcellular fractions of rat cerebellum: effects of calcium Myo-inositol monophosphatase is an activated target of calbindin D28k Structure-assisted design of mechanism-based irreversible inhibitors of human rhinovirus 3C protease with potent antiviral activity against multiple rhinovirus serotypes Structure of human rhinovirus 3C protease reveals a trypsin-like polypeptide fold, RNA-binding site, and means for cleaving precursor polyprotein Enzymes Mechanism of the maturation process of SARS-CoV 3CL protease Characterization of trans-and ciscleavage activity of the SARS coronavirus 3CLpro protease: basis for the in vitro screening of anti-SARS drugs On the size of the active site in proteases Processing of Xray diffraction data collection in oscillation mode Study of the specificity of thrombin with tripeptidyl-p-nitroanilide substrates Single-step purification of polypeptides expressed in Escherichia coli as fusions with glutathione S-transferase Thrombin specificity. Requirement for apolar amino acids adjacent to the thrombin cleavage site of polypeptide substrate Active-site mapping of bovine and human blood coagulation serine proteases using synthetic peptide 4-nitroanilide and thio ester substrates We thank Xuemei Li and Sheng Ye for technical assistance; Huanming Yang, Jian Wang, and Jun Yu for providing cDNA of SARS-CoV M pro ; Rongguang Zhang and Andrzej Joachimiak for data collection of WT SARS-CoV M pro . This work was supported by Project 973 of the Ministry of Science and Technology of China (grant number 2004BA519A30), the NSFC (grant number 30221003), the Sino-German Center (grant number GZ236(202/9)), the Sino-European Project on SARS Diagnostics and Antivirals (SEPSDA) of the European Commission (grant number 003831) and the Tsinghua University PhD student innovation fund. Supplementary data associated with this article can be found, in the online version, at doi:10.1016/ j.jmb.2006.11.073