key: cord-0852282-6fsz9iy7
authors: Saikatendu, Kumar Singh; Joseph, Jeremiah S.; Subramanian, Vanitha; Clayton, Tom; Griffith, Mark; Moy, Kin; Velasquez, Jeffrey; Neuman, Benjamin W.; Buchmeier, Michael J.; Stevens, Raymond C.; Kuhn, Peter
title: Structural Basis of Severe Acute Respiratory Syndrome Coronavirus ADP-Ribose-1″-Phosphate Dephosphorylation by a Conserved Domain of nsP3
date: 2005-11-08
journal: Structure
DOI: 10.1016/j.str.2005.07.022
sha: 06eb4e38a1b026571d50fdb5b5ce4eacb39b0a2a
doc_id: 852282
cord_uid: 6fsz9iy7

The crystal structure of a conserved domain of nonstructural protein 3 (nsP3) from severe acute respiratory syndrome coronavirus (SARS-CoV) has been solved by single-wavelength anomalous dispersion to 1.4 Å resolution. The structure of this “X” domain, seen in many single-stranded RNA viruses, reveals a three-layered α/β/α core with a macro-H2A-like fold. The putative active site is a solvent-exposed cleft that is conserved in its three structural homologs, yeast Ymx7, Archeoglobus fulgidus AF1521, and Er58 from E. coli. Its sequence is similar to yeast YBR022W (also known as Poa1P), a known phosphatase that acts on ADP-ribose-1″-phosphate (Appr-1″-p). The SARS nsP3 domain readily removes the 1″ phosphate group from Appr-1″-p in in vitro assays, confirming its phosphatase activity. Sequence and structure comparison of all known macro-H2A domains combined with available functional data suggests that proteins of this superfamily form an emerging group of nucleotide phosphatases that dephosphorylate Appr-1″-p.

Severe acute respiratory syndrome (SARS) emerged as the first severe and readily transmissible new disease of the 21st century and caused 8000 infections and more than 800 deaths in 2003 (Groneberg et al., 2003) . The causative organism is a new coronavirus (SARS-CoV) that is distantly related to group II coronaviruses. The virus has a single-stranded RNA genome of w29.7 kb that encodes at least 14 putative open reading frames (ORFs) (Peiris et al., 2003; Drosten et al., 2003) ( Figure  1A ). Two-thirds of the viral genome at the 5 0 end is organized as a single highly conserved ORF, known as ORF-1a/1ab, that is translated into two large polyproteins, pp1a and pp1ab (Ksiazek et al., 2003) . Translation of pp1ab involves ribosomal frameshifting, a feature also seen in many other coronaviruses (Baranov et al., 2005; Snijder et al., 2003) . Termed as the ''replicase polyproteins,'' pp1a and pp1ab are subsequently posttranslationally cleaved by two virus-encoded proteases, the 3C-like protease (the main protease or 3CL-Pro) and the papain-like cysteine protease (PLP), into 16 mature protein products ( Figure 1A ). These ''nonstructural proteins'' or nsPs (nsP1-nsP16) form a giant replicase complex that participates in numerous functions during viral infection, such as replication of the RNA genome, processing of subgenomic RNA, and packaging of newly budding virions (Ziebuhr et al., 2001) .

The third of these nonstructural proteins, nsP3, is a large multidomain protein of 1922 amino acids that spans residues 819-2740 of pp1a (NP_828862; gi: 34555776) . Mature nsP3 results due to proteolytic cleavage of pp1a at two sites ( 818 GA 819 and 2740 GK 2741 ) by the papain-like proteinase . nsP3 has conserved sequence motifs of six independent domains: (1) an N-terminal glu-rich acidic domain; (2) an X domain with predicted Appr-1 00 -p processing activity; (3) a SUD domain (SARS-specific unique domain); (4) a peptidase C-16 domain that contains the papain-like protease; (5) a transmembrane domain; and (6) the Y domain ( Figure 1A ).

The conserved X domain of nsP3 has been predicted to house a putative adenosine diphosphate ribose 1 00 phosphatase (ADRP) function and is annotated in domain classification databases such as SMART (Letunic et al., 2004) and Interpro (Mulder et al., 2003) as a member of the A1pp superfamily that includes more than 300 proteins from archaea, bacteria, eukaryotes, and singlestranded, positive-sense RNA viruses. Structures of three homologs from this superfamily, yeast Ymx7, E. coli Er58, and a conserved C-terminal domain of nonhistone macro-H2A from Archeoglobus fulgidus (AF1521), show that they all adopt a generic macro-H2A-like fold with minor variations. While the function of some members of this superfamily, like the human poly-(ADP-ribose) polymerase, has been experimentally characterized (D'Amours et al., 1999) , that of many others is yet to be determined.

As a part of an integrated program to study emerging infectious diseases, we have undertaken the structural and functional characterization of all of the major protein products in SARS-CoV by using a multipronged approach. This included a detailed bioinformatics analysis of the SARS-Tor2 genome by using sensitive profilebased methods like PSI-BLAST (Altschul et al., 1997) , FFAS (Rychlewski et al., 2000) , and HMMER (Eddy, 1996) for detection of remote homologs, identification of domain boundaries in multidomain proteins, and functional annotation. Based on this analysis, 173 constructs that cover the entire proteome were designed and cloned into vectors for overexpression in E. coli and baculovirus systems (http://sars.scripps.edu).

In this study, we present the first of the crystal structures from this effort, that of the highly conserved putative phosphatase domain of nsP3. To our knowledge, this is the first crystal structure of this domain from positive-sense, single-stranded RNA viruses. It reveals a close structural relationship with prototypical macro-H2A-like fold proteins. One of its sequence homologs, Poa1p (YBR022) from Saccharomyces cerevisiae, was recently functionally characterized as a highly specific phosphatase that removes the 1 00 phosphate group of ADP-ribose-1 00 -phosphate (Appr-1 00 -p) in the latter half of the tRNA splicing pathway in yeast (Shull et al., 2005) , hinting at a similar substrate specificity for SARS ADRP. Using an in vitro assay, we experimentally validate that this ADRP domain of SARS nsP3 is indeed a phosphatase that removes the terminal 1 00 phosphate from Appr-1 00 -p. To our knowledge, these results, combined with recently elucidated structures of two hypothetical proteins, suggest that a majority of macro-H2A fold members form a new family of nucleotide phosphatases.

Description of the ADRP Domain of SARS nsP3 The cloned insert contains 182 residues from nsP3 of SARS-CoV-Tor2 and has a monomer molecular weight of 19,523 Da and a pI of 6.9. The final structural model refined against crystallographic data to 1.4 Å resolution has four subunits in the asymmetric unit in very similar conformations (rmsds < 0.4 Å for 166 Ca atoms). We do not observe electron density for a few residues at the C terminus of each of the four monomers. These include 5 residues in chain D, 15 residues in the chain A, and 9 residues each in the B and C monomers. The final refinement statistics and stereochemical parameters of the structure are listed in Table 1 . Overall, each subunit consists of eight b strands and six a helices ( Figure 2A ). Strands 2-8 form a central seven-stranded b sheet that has a strand order of 2387465. The outermost strands on either side are antiparallel to the rest. The six helices straddle the central b sheet to form a three-layered a/b/ a topology. Two of the subunits in the asymmetric unit form a loosely packed head-to-head dimer ( Figure 2B) . A short loop connecting strand 6 and helix H4 is involved in weak interfacial contacts with the conserved Gly-rich segment of the other monomer. The interface is fairly small at w870 Å 2 (435 Å 2 per monomer) and predominantly nonpolar (60%). Residues that form the putative active site lie close to the dimer interface. The enzyme elutes as a homodimer in gel-filtration studies (data not shown), indicating that the physiologically relevant form of this protein may be dimeric.

Comparison of one of the chains of SARS ADRP against all known structures in the PDB by using DALI (Holm and Sander, 1993) revealed the presence of two structural homologs: a hypothetical protein from Archeoglobus fulgidus, AF1521 (PDB code: 1HJZ; z score of 18.1; rmsd of 2.4 Å for 152 superimposed Ca atoms; pairs with a z score > 3.0 are considered structurally similar), , 1995) . Both structures are members of the ''macrodomain-like'' fold as defined in the SCOP database (Murzin et al., 1995) . This fold includes two other structural homologs from E. coli, aminopeptidase A (PepA) and a hypothetical protein ymbD (Northeast Structural Genomics Consortium target Er58; PDB code: 1SPV). The topological connectivity of the secondary structural elements of these four proteins along with the ADRP domain of SARS nsP3 is shown in a similar orientation in Figure 3 (A, B, D, E, and F). All of them share the same three-layered a/b/a core, with minor variations. They have a mixed b sheet of six strands with strand order 165243. The first strand and the first helix are absent in bovine lens leucine aminopeptidase (BlLAP, Figure 3A ). AF1521 has two insertions to this core, a b strand inserted at the N terminus and an a helix between strands 3 and 4 ( Figure 3E ). The SARS ADRP domain has two b strands inserted at the N terminus, one of which forms part of the central b sheet ( Figure 3F ). The sixth protein structure in Figure 3 , Ymx7 from yeast (PDB code: 1TXZ), is a member of this fold and has a circular permutation ( Figure 3C ). The first strand and the first helix of this protein occupy structural positions that correspond to the last b strand and the C-terminal helix (H6) of a canonical macro-like fold.

BlLAP is an exopeptidase that cleaves amino acids from the N terminus of polypeptides (Burley et al., 1990) . E. coli PepA is a DNA binding protein that is involved in Xer site-specific recombination and transcriptional control of the carAB operon (Strä ter et al., 1999). Although both share significant similarities in sequence (31% identity) and structure (both are homohexameric with a dinuclear Zn 2+ in their active site), they have widely different functions. The peptidase activity is not needed by PepA to function during Xer-specific recombination (McCulloch et al., 1994) or during repression of carAB transcription (Charlier et al., 1995) . On the other hand, blLAP does not have any demonstrated DNA binding function. AF1521 is a stand-alone macrodomain from Archeoglobus fulgidus and is a close homolog of the C-terminal nonhistone domain of the largest variant of histone H2A (Pehrson and Fried, 1992; Pehrson and Fuji, 1998) . It is evolutionarily related to P loop-containing nucleotide triphosphate hydrolases. The structure has been solved in its apo form (Allen et al., 2003) and in complex with two ligands, Mg 2+ -ADP (PDB code: 2BFR, unpublished) and ADP-ribose (PDB code: 2BFQ; unpublished). Yeast Ymx7 is a conserved hypothetical protein from the ADH3-RCA1 intergenic region. Although its function has not been experimentally demonstrated, it has been annotated as an ADP-ribose-1 00 -monophosphatase (ADRP) based on its sequence similarity to known ADRPs (Kumaran et al., 2005) . Finally, the structure of a conserved hypothetical protein, Er58, from E. coli that was solved by the Northeast Structural Genomics Consortium (PDB code: 1SPV, unpublished) reveals a canonical macrolike fold. Its function remains unknown.

It would thus appear that the five known members of this fold fall into two broad functional groups, one containing BlLAP and E. coli PepA and the second containing the other three hypothetical proteins. All members of the second group not only share a similar global architecture, but also share conserved active site features. Although all of these proteins can be picked up by PSI-BLAST by using SARS ADRP as a query template, it is clear that the SARS domain is closer to phosphatases of the second group.

The ADRP domain of SARS nsP3 has a deep solventexposed cleft on the protein surface that is very similar to that seen in AF1521, yeast Ymx7, and E. coli Er58. Surface representations showing the distribution of electrostatic potential on SARS ADRP and on the structures of ligand bound forms of AF1521 and yeast Ymx7 (shown in Figure 4A ) clearly indicate that the putative active site cleft is similar in the three structures. Repeated soaking and cocrystallization attempts failed to yield cocrystals of SARS ADRP with ADP-ribose, perhaps because the active site is occluded by the dimer interface. However, the availability of the product (ADP-ribose) bound forms of AF1521 and yeast Ymx7 facilitates a detailed structure comparison of these two homologs with SARS ADRP. Many of the residues that interact with the ligand are conserved in the three structures. A view of the proposed active site of SARS ADRP along with the superimposed structures of AF1521 and yeast Ymx7 are shown in Figure 4B , highlighting the interactions that are likely between residues of the protein with the ligand. A structure-based sequence alignment of SARS ADRP with four of its structural homologs is shown in Figure  4C . The BlLAP sequence was omitted in this alignment.

Most macro-like fold proteins, including the ADRP domain from RNA viruses, show the presence of four conserved stretches of residues ( Figure 1B ). The first motif ''XXNAAN,'' where XX are any two hydrophobic amino acids, is highly conserved across the superfamily. This is immediately followed by a Gly-rich region (GGGVAG) that is reminiscent of the Walker A motif seen in many P loop nucleotide hydrolases (Walker et al., 1982) . A notable feature is that the invariant lysine of the Walker A motif is an arginine in some coronaviruses and is absent in others ( Figure 1B) . The third stretch, ''XVVGP,'' where X is often a conserved histidine, is in the middle of the polypeptide. Finally, a stretch of 4 residues mainly consisting of small hydrophilic amino acids and a glycine is present near the C terminus of the polypeptide chain ( Figure 1B) . Residues from the third motif occupy structurally similar positions to the Walker B motif in classical P loop hydrolases. These four regions line the putative active site of the ADRP domain of the SARS nsP3 structure. The first motif forms part of the fourth b strand (Figure 4C ), while the Gly-rich segment is part of a loop that connects strand 4 with the second helix. The third motif connects strand 6 to helix H4.

The active site can be broadly divided into the adenine binding cleft, the first ribose, and the bisphosphate binding site, followed by the terminal ribose-phosphate binding pocket that is the center of catalysis. As anticipated, the adenine binding pocket consists of largely hydrophobic residues. It is less conserved in the three structures than the other two pockets. In SARS ADRP, residues Ile23, Ala52, Pro125, and Ala154 form the walls of the putative adenine binding cleft. In the AF1521-ADP-ribose complex structure, the adenine ring is stabilized by two hydrogen bonds. One of the side chain carbonyl oxygens of Asp20 is within hydrogen bonding distance to the N1 and N6 of the adenine rings. In SARS ADRP, the equivalent residue is Asp22 and is likely to play a similar role. The other hydrogen bond is between the N7 and the backbone carbonyl group of Gly42. The binding site of the first ribose ring is a highly hydrated solvent-exposed cleft in which multiple watermediated interactions are seen between the ribose and residues Asp177 and Ser180 in AF1521. In SARS ADRP, residues Asn156 and Asp157 that lie in a loop between strand 6 and helix H6 are likely to stabilize the ribose by forming similar polar interactions.

The a and b phosphates of the ADP moiety are mainly stabilized by backbone hydrogen bonds with the two Gly-rich motifs in a manner similar to that observed in P loop hydrolases. While the a phosphate is stabilized by hydrogen bonds with the backbone of the three consecutive glycines of motif II, the b phosphate interacts with the amides of Gly130 and Ile131. This loop also helps to stabilize the b phosphate, as the Walker B motif does in P loop hydrolases.

The terminal ribose moiety of the ADP-ribose-1 00phosphate lies on a cleft that is approximately perpendicular to the adenine binding pocket. This is the putative site of catalysis. The side chain amide nitrogen of the conserved asparagine (N80) forms hydrogen bonds with the O3 and O4 of the ribose in the yeast Ymx7-ADPribose complex ( Figure 4D ). This residue from the first motif superimposes almost perfectly with Asn40 in SARS ADRP and Asn34 in AF1521 and is invariant among all macro-like fold members ( Figure 1B ). Asp90 and His145, two residues that have been implicated in catalysis in yeast Ymx7, lie embedded underneath the loop that connects strand 4 and helix H2 ( Figure 4D ).

Given the close similarity between the three structures (SARS ADRP, AF1521, and Ymx7) and the similarity at the sequence level between SARS ADRP and yeast Poa1p (an enzyme with demonstrated ADRP activity), it was apparent that their function was likely to be similar as well. We therefore tested the ability of SARS ADRP to dephosphorylate Appr-1 00 -p in vitro. We employed a generic assay that monitors the liberation of inorganic phosphate in solution during catalysis (Webb, 1992) . The results are shown in Figure 5 . We observed a sustained release of phosphate after the addition of increasing amounts of the substrate (Appr-1 00 -p) to the assay containing fixed amounts of the enzyme ( Figure 5A ). Upon overnight incubation, the amount of phosphate released was proportional to the amount of the substrate added, suggesting that SARS ADRP indeed had the ability to dephosphorylate Appr-1 00 -p into ADP-ribose and inorganic phosphate ( Figure 5B ). Further kinetic characterization of the enzyme shows that the dephosphorylation is relatively low, with a K M of 52.7 6 8.2 mM and a k cat of 5.19 min 21 . While the observed catalytic efficiency of this enzyme is not very high, it is comparable to the values reported for Poa1p. In a TLC-based assay with radiolabeled substrates, both Poa1p and Hal2p, a known 3 0 phosphatase of 5 0 ,3 0 -pAp, showed similar low catalytic yields (K M = 2.8 mM; k cat = 1.7 min 21 for Poa1p), but both enzymes were highly specific for the Appr-1 00 -p substrate (Shull et al., 2005) . A few well-known phosphatases whose activity has been monitored by the same assay (Wang et al., 1995) also show similar levels of activity.

There might be multiple reasons for the low activity levels detected in these assays. It might be intrinsic for this class of enzymes, as seen in the case of yeast Poa1p. Moreover, the released product, ADP-ribose, is a competitive inhibitor of this reaction (Shull et al., 2005) . The proposed active site is occluded at the dimer interface in the crystal structure ( Figure 2B ) and might be hindering access to the substrate in our in vitro assay. The in vivo scenario might be different, where enzyme activity might be regulated by other components of the replicase complex.

Yeast Ymx7, one of the structural homologs of SARS ADRP, has been proposed to perform the same reaction. It is a remote homolog of the macrodomain superfamily, albeit with a circular permutation ( Figure 3C ). It also has a different set of catalytic residues at the active site when compared to classical macrodomains. Based on the structure of the ADP-ribose bound Ymx7, Kumaran et al. (2005) have speculated on a catalytic mechanism that involves three residues: Asp90, His145, and Asn80. While the histidine and asparagine residues are conserved in all three of the structures, Ymx7, AF1521, and SARS nsP3, the equivalent position of Asp90 of Ymx7 is an alanine in the other two (Ala50 in ADRP and Ala44 in AF1521-ADP-ribose complex; Figure 4D ). This would imply that while the proposed mechanism might be correct in the case of yeast Ymx7, it cannot be the mode of dephosphorylation for either ADRP or AF1521. Instead, these two enzymes have a histidine (His45 in ADRP and His39 in AF1521) residue that might be in close proximity to the terminal 1 00 phosphate of the substrate and might therefore be involved in catalysis. Alternately, it might be speculated that the role of the predominant nucleophile might be played by one of the aspartates or glutamates from the loop 101 NAGEDIQ 107 in SARS and other coronaviral ADRPs. This loop shows large conformational changes in the apo and ligand bound forms of AF1521 and Ymx7 and is rich in acidic residues ( Figure 1B) . Further mechanistic studies, cocrystallization experiments, and mutagenesis of these residues that are implicated here are necessary to elucidate the catalytic mechanism of this enzyme. Despite repeated attempts at soaking and cocrystallization, we have not been able to observe density of the bound substrate. A possible reason might be the limited accessibility of the active site, as it is buried in the dimer interface during crystal packing ( Figure 2B ).

The demonstrated function of SARS ADRP as an Appr-1 00 -p phosphatase has important functional implications in the SARS life cycle. While the manner in which the virus infects the human host is fairly well characterized, many of the postinfection events that occur in the intracellular milieu of the host remain poorly understood. The infection process begins when the spike glycoprotein present on the viral coat recognizes one of two receptors present on the human cell surface: angiotensinconverting enzyme-2 (ACE-2) (Li et al., 2003; Kuhn et al., 2004) or a C-type lectin known as L-SIGN or CD209L (Jeffers et al., 2004) . In arteri-and coronaviruses, an early postinfection event is the transcription of a nested set of subgenomic RNA (Lai and Holmes, 2001; Thiel et al., 2003) . The resulting mRNAs contain a short 5 0 -terminal ''leader'' sequence derived from the 5 0 end of the genome. The fusion of the two noncontiguous RNA segments is achieved by a discontinuous step in the synthesis of the minus strand and involves transcription regulatory sequences or TRSs Pasternak et al., 2001) . Very few experimental details exist on the processing, maturation, and subsequent roles of these important molecules in the viral life cycle.

This process has parallels in the eukaryotic tRNA splicing pathway that has been well studied in yeast and plants (Culver et al., 1994; Phizicky and Greer, 1993; Peebles et al., 1983) . In these organisms, pre-tRNA splicing is initiated by cleavage at the splice site by an endonuclease. The resulting tRNA halves are then ligated to yield mature tRNA that retains the 2 0 phosphomonoester group at the splice site (Phizicky and Greer, 1993; McCraith and Phizicky, 1990) . Using NAD as an acceptor, a phosphotransferase removes the 2 0 phosphate to yield ADP-ribose-1 00 -2 00 cyclic phosphate or Appr>p (Culver et al., 1993) . A cyclic phosphodiesterase hydrolyzes Appr>p to yield Appr-1 00 -p (Culver et al., 1994; Martzen et al., 1999) . The terminal step in this pathway is a phosphatase-catalyzed conversion of Appr-1 00 -p into ADP-ribose and inorganic phosphate, which are channeled through various cellular metabolic pathways.

There is increasing evidence that the NendoU (nsP15) in SARS functions in a manner analogous to the endonuclease of the tRNA splicing pathway. It is a Mn 2+ -dependent enzyme that also releases products with 2 00 -3 00 cyclic phosphorylated ends (Ivanov et al., 2004; Bhardwaj et al., 2004) . While work on this enzyme was in progress, the eukaryotic homolog of NendoU, the XendoU from Xenopus laevis, was functionally characterized, highlighting the existence of an orthologous pathway in higher eukaryotes (Gioia et al., 2005) . Details of this process are only beginning to emerge. It is noteworthy that, although orthologs of cyclic phosphodiesterase (CPD), the enzyme that catalyzes the previous step in the tRNA splicing pathway, has been found in group II coronaviruses along with toro-and rotaviruses, it is absent in the SARS virus .

SARS NendoU specifically recognizes uridylate bases at GUU sites of RNA (Ivanov et al., 2004) . The virus protects its own RNA by methylating its 5 0 termini CAP by using an Ado-Met-dependent RNA methyltransferase (von Grotthuss et al., 2003) , a process that is imperative during coronaviral replication and an active area of therapeutic intervention (Bach et al., 1995; Vlot et al., 2002) . The possibility that SARS ADRP, NendoU, and the methyltransferase might be acting in concert and might therefore be functionally linked has been the subject of previous speculation . The precise role of these three enzymes along with 3 0 -5 0 exonuclease and RNA polymerase and their possible interaction with each other as integral components of the replicase complex remain poorly understood. It is becoming increasingly clear that coronaviruses not only differ from other related viruses in having a bigger genome size, but they also have an uncanny similarity with DNA-based life forms in their ability to maintain, synthesize, and regulate the proteomic and genomic components of their life cycle in hitherto unforeseen ways. The work presented here further reinforces this view and hints at the possibility of a tRNA splicing pathway-like process by which the generation of subgenomic RNA and its subsequent translation to yield mature viral proteins is regulated.

Orthologs of SARS ADRP are found embedded in nonstructural proteins of many related ssRNA viruses, especially in alphaviruses of togaviridae (group II of Figure  1B ). These include, among others, nsP2 of Sindbis virus (Strauss et al., 1984) , nsP3 of O'nyong-nyong virus (Lanciotti et al., 1998) , nsP3 of Ross River virus (Shirako and Yamaguchi, 2000) , P150 of the lone nsP in Rubella virus (Zheng et al., 2003) , nsP3 of Mayaro virus (Anderson et al., 1954) , and nsP3 of Semliki Forest virus (Tuittila et al., 2000) . Many of these viruses have a greatly reduced genome size (w10 kb), with only about 4-5 ORFs. On the other hand, the five known human coronaviruses, HCoV-OC43, HCoV-229E, HCoV-NL63, HCoV-HKU1, and SARS-CoV, have genome sizes of 27-32 kb. The occurrence of this phosphatase as part of their replicative machinery underscores the importance of this enzyme in their life cycle and hints at a similar mechanism by which their genomic and subgenomic RNA could be processed inside their respective host cells. However, given the greatly reduced proteome size and reliance of some togaviridae members on host enzymes to meet their replication needs, this process may be somewhat different from that seen in SARS and the other human coronaviruses.

To our knowledge, this study provides the first structural characterization of a highly specific phosphatase from an RNA virus. The experimental demonstration of phosphatase activity on Appr-1 00 -p, combined with its structural relationship with other known macro-fold members, strongly hints at the possibility that many ''hypothetical'' proteins of this superfamily might in fact be phosphatases that act on similar substrates. The unique differences between the active site of SARS ADRP and yeast Ymx7, both of which dephosphorylate the same substrate, imply that, while being structurally and functionally homologous, they probably employ different catalytic mechanisms. Further studies are needed to fully explore the functional significance of this enzyme in the larger context of the membrane bound replicase complex and its regulation of translation and replication of viral RNA. If true, the functional link between SARS ADRP and other nsPs highlighted here could provide new avenues for investigation of the replication process of the virus in infected cells, with the hope of developing therapeutic agents aimed at inhibiting viral replication.

The sequence corresponding to residues 184-365 (182 aa) of SARS nonstructural protein nsP3 (gi:34555776; NP_828862) of poly-protein pp1a was amplified by polymerase chain reaction (PCR) from genomic cDNA of SARS-Tor2 strain with Taq polymerase and primer pairs encoding the predicted 5 0 and 3 0 ends (forward: 5 0 -CCAGTTAA TCAGTTTACTGGTTATTTAAAACTTACTGAC-3 0 ; reverse: 5 0 -CTCCT CTTGTTTAGGTGCTTCC-3 0 ). The PCR product was cloned into plasmid pMH1f, which encodes an expression and purification tag (MGSDKIHHHHHH) at the amino terminus. Protein expression was performed on a sequence-verified clone in native 2xYT or selenomethionine (SeMET)-containing media by using the E. coli methionine auxotrophic strain DL41. Bacteria were lysed by sonication in lysis buffer (50 mM KPO4 [pH 7.8], 300 mM NaCl, 10% glycerol, 5 mM imidazole, two Roche EDTA-free protease inhibitor tablets) with 0.5 mg/ml lysozyme. Cell debris was clarified by ultracentrifugation at 45,000 rpm for 20 min (4ºC), and the soluble fraction was applied onto a metal chelate column (Talon resin charged with cobalt; Clontech). The column was washed in 20 mM Tris (pH 7.8), 300 mM NaCl, 10% glycerol, 5 mM imidazole and was eluted with 25 mM Tris (pH 7.8), 300 mM NaCl, 150 mM imidazole. The resultant protein was further purified by using anion exchange chromatography on a Poros HQ column with elution buffer containing 25 mM Tris (pH 8.0) and 1 M NaCl. The pure fractions of the protein were pooled, and the buffer was exchanged into crystallization buffer (10 mM Tris [pH 7.8], 150 mM NaCl) and concentrated by centrifugal ultrafiltration. The final concentration of native and SeMET protein was 1.0 mM and 1.4 mM, respectively. The protein was either frozen in liquid nitrogen for later use or used immediately for crystallization trials.

The protein was crystallized with the nanodroplet vapor diffusion method (Santarsiero et al., 2002) by using standard JCSG crystallization protocols (Lesley et al., 2002) . Thick, rectangular, rod-like crystals (w200 mm 3 w100 mm 3 w75 mm) appeared after 10 days in 0.4 ml drops containing 0.2 ml each of protein and crystallization well solution containing 1.5 M sodium malonate (pH 7.0). A higher concentration (1.8 M) of sodium malonate with 25% glycerol was used as cryoprotectant. A native 1.4 Å dataset (at a wavelength of 0.9794 Å ) was collected at beamline 11.1 of the Stanford Synchrotron Radiation Laboratory by using Blu-ICE (McPhillips et al., 2002) . Anomalous diffraction data were collected at the Advanced light Source (ALS, Berkeley, CA) on beamline 8.2.1 at a wavelength of 0.97941 Å , corresponding to the peak wavelength of a selenium SAD experiment. Reflections were indexed in a primitive orthorhombic lattice (Space group P2 1 2 1 2 1 ), integrated, and scaled by using the HKL2000 suite (Otwinowski and Minor, 1997) .

The initial phases were obtained by the single wavelength anomalous dispersion (SAD) phasing method with data collected to 2.2 Å at the selenium peak wavelength by using the program SOLVE (Terwilliger and Berendzen, 1999) . All 12 selenium sites were located, and the resulting phases had a figure of merit of 0.54 after density modification procedures by using RESOLVE (Terwilliger, 2003) . The resultant phases from SAD were merged, improved, and extended for a native data set to 1.4 Å by using the programs CAD and DM as implemented in the CCP4 package (CCP4, 1994) assuming four monomers in the ASU with Matthews coefficient 2.6 and a solvent content of 51% (Cowtan, 1994) . Automated model building with Arp/wARP (ver 6.0; Lamzin and Wilson, 1997) traced w80% of the backbone and docked 65% of the sequence, including the side chains. The rest of the sequence was manually built into the density with O (Jones et al., 1991) and was refined against the high-resolution native data to 1.4 Å with iterative rounds of model building and refinement by using Refmac5 (Murshudov et al., 1997) of CCP4. Although RESOLVE did identify the presence of NCS among the monomers, it was not used at any stage of refinement. A summary of data collection and refinement statistics is shown in Table 1 . The stereochemical quality of the final refined model was checked with Procheck (Laskowski et al., 1993) , and the dimer interface was calculated by using the protein-protein interaction server. The ribbon diagrams were made with Pymol (DeLano, 2002) .

The substrate Appr-1 00 -p was a kind gift from Prof. Phyzicky (Rochester Univ, USA) and was enzymatically prepared by reacting the precursor Appr>p with cyclic phosphodiesterase (CPD) by using procedures described in Shull et al. (2005) . Phosphate release was monitored by the Enzchek assay (Molecular Probes Inc, Eugene OR, USA) by following the manufacturer's instructions. The assay uses the method of Webb (1992) , which monitors the release of inorganic phosphate by coupling the phosphatase reaction with the enzymatic conversion of 2-amino-6-mercapto-7-methyl-purine riboside (MESG) to 2 amino-6-mercapto-7-methyl purine and ribose-1-phosphate by purine nucleoside phosphorylase. The substrate MESG has an absorbance maximum of 330 nm, whereas that of the product is 360 nm. Each 1 ml reaction mixture contains 50 mM Tris (pH 7.5), 1 mM MgCl 2 , 0.1 mM sodium azide, 200 mM MESG, 1 U purine nucleoside phosphorylase, and 2.7 mM enzyme. Increasing amounts of the substrate were added to the reaction mixture, and the ADRP reaction was monitored by changes in absorbance at 360 nm in a UV spectrophotometer. To check for phosphate contamination, appropriate control reactions were performed in the presence of enzyme, but with no substrate and vice versa. No measurable phosphate contamination was detected either from the enzyme preparation, substrate degradation, or from the buffers. The assay components were checked with known amounts of phosphate standard supplied by the manufacturer. A molar extinction coefficient of 11,000 M 21 cm 21 of the product of the PNP reaction at 360 nm was used to quantitate the amount of released inorganic phosphate (Etzkorn et al., 1994) .

The crystal structure of AF1521, a protein from Archaeoglobus fulgidus with homology to the non-histone domain of mac-roH2A

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

Mayaro virus: a new human disease agent. II. Isolation from blood of patients in Trinidad

Effect of methyltransferase inhibitors on the regulation of baculovirus protein synthesis

Programmed ribosomal frameshifting in decoding the SARS-CoV genome

The severe acute respiratory syndrome coronavirus Nsp15 protein is an endoribonuclease that prefers manganese as a cofactor

Molecular structure of leucine aminopeptidase at 2.7-Å resolution

The CCP4 Suite: programs for Protein Crystallography

CarP, involved in pyrimidine regulation of the Escherichia coli carbamoyl-phosphate synthetase operon encodes a sequence-specific DNA-binding protein identical to XerB and PepA, also required for resolution of ColEI multimers

DM: an automated procedure for phase improvement by density modification

An NAD derivative produced during transfer RNA splicing: ADP-ribose 1 00 -2 00 cyclic phosphate

tRNA splicing in yeast and wheat germ. A cyclic phosphodiesterase implicated in the metabolism of ADP-ribose 1 00 -2 00 cyclic phosphate

ADP-ribosyl)ation reactions in the regulation of nuclear functions

The PyMOL Molecular Graphics System

Identification of a novel coronavirus in patients with severe acute respiratory syndrome

Hidden Markov models

Cyclophilin residues that affect the noncompetitive inhibition of the protein serine phosphatase activity of Calcineurin by the cyclophilin-cyclophorin A complex

Functional characterisation of XendoU, the endoribonuclease involved in small nucleolar RNA biosynthesis

Severe acute respiratory syndrome: global initiatives for disease diagnosis

Protein structure comparison by alignment of distance matrices

Major genetic marker of nidoviruses encodes a replicative endoribonuclease

CD209L (L-SIGN) is a receptor for severe acute respiratory syndrome coronavirus

Improved methods for building protein models in electron density maps and the location of errors in these models

A novel coronavirus associated with severe acute respiratory syndrome

Angiotensin-converting enzyme 2: a functional receptor for SARS coronavirus

Structure and mechanism of ADP-ribose-1 00 -monophosphatase (Appr-1 00 -pase), a ubiquitous cellular processing enzyme

Coronaviruses

Automated refinement for protein crystallography

Emergence of epidemic O'nyong-nyong fever in Uganda after a 35-year absence: genetic characterization of the virus

PROCHECK: a program to check the stereochemical quality of protein structures

Structural genomics of the Thermotoga maritima proteome implemented in a high-throughput structure determination pipeline

SMART 4.0: towards genomic data integration

Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus

A biochemical genomics approach for identifying genes by the activity of their products

A highly specific phosphatase from Saccharomyces cerevisiae implicated in tRNA splicing

Peptidase activity of Escherichia coli aminopeptidase A is not required for its role in Xer site-specific recombination

Blu-Ice and the Distributed Control System: software for data acquisition and instrument control at Macromolecular crystallography beamlines

The InterPro Database, 2003 brings increased coverage and new features

Refinement of macromolecular structures by the maximum-likelihood method

SCOP: a structural classification of proteins database for the investigation of sequences and structures

Processing of X-ray diffraction data collected in oscillation mode

Sequence requirements for RNA strand transfer during nidovirus discontinuous subgenomic RNA synthesis

Precise excision of intervening sequences from precursor tRNAs by a membrane-associated yeast endonuclease

MacroH2A, a core histone containing a large nonhistone region

Evolutionary conservation of histone macroH2A subtypes and domains

Coronavirus as a possible cause of severe acute respiratory syndrome

Pre-tRNA splicing: variation on a theme or exception to the rule?

Comparison of sequence profiles. Strategies for structural predictions using sequence information

An approach to rapid protein crystallization using nanodroplets

Genome structure of Sagiyama virus and its relatedness to other alphaviruses

A highly specific phosphatase that acts on ADP-ribose 1 00 -phosphate, a metabolite of tRNA splicing in Saccharomyces cerevisiae

Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage

Two-metal ion mechanism of bovine lens leucine aminopeptidase: active site solvent structure Structure and binding mode of L-leucinal, a gem-diolate transition state analogue, by X-ray crystallography

X-ray structure of aminopeptidase A from Escherichia coli and a model for the nucleoprotein complex in Xer site-specific recombination

Complete nucleotide sequence of the genomic RNA of Sindbis virus

Statistical density modification using local pattern matching

Automated MAD and MIR structure solution

Mechanisms and enzymes involved in SARS coronavirus genome expression

Replicase complex genes of Semliki Forest virus confer lethal neurovirulence

Role of the alfalfa mosaic virus methyltransferase-like domain in negative-strand RNA synthesis

mRNA cap-1 methyltransferase in the SARS genome

Distantly related sequences in the a-subunits and b-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold

A continuous spectrophotometric assay for phosphorylase kinase

A continuous spectrophotometric assay for inorganic phosphate and for measuring phosphate release kinetics in biological systems

Characterization of genotype II Rubella virus strains

The autocatalytic release of a putative RNA virus transcription factor from its polyprotein precursor involves two paralogous papain-like proteases that cleave the same peptide bond

The authors gratefully acknowledge Alexei Brooun and Amy Houle for helping with cloning SARS targets. Bioinformatics support for this project was provided by Enrique Abola, Anand Kolatkar, and Sophie Coon of The Scripps Research Institute and Weizhong Li and Adam Godzik of the Burnham Institute. The authors thank Neil Shull and Prof. Eric M. Phizicky from Rochester University for providing the substrate Appr-1 00 -p. We also acknowledge the helpful support of the beamline staff at the Advanced Photon Source (GM/CA-CAT), Advanced Light Source (BL-8.2.1), and Stanford Synchrotron Radiation Laboratory (SSRL) (BL-11.1) synchrotron facilities for help in data collection. SSRL BL11-1 is supported by the National Institutes of Health (NIH) National Center for Research Resources, NIH National Institutes of General Medical Sciences, Department of Energy, Office of Biological and Environmental Research, Stanford University, and The Scripps Research Institute (TSRI). The General Medicine and Cancer Institutes Collaborative Access Team is supported by the National Cancer Institute (Y1-CO-1020) and the National Institute of General Medical Sciences (Y1-GM-1104). This study was supported by National Institutes of Allergy and Infectious Disease/NIH Contract # HHSN 266200400058C ''Functional and Structural Proteomics of the SARS-CoV'' to P. K. TSRI manuscript 17502-CB.

The final refined coordinates and the structure factors have been deposited in the Protein Data Bank under ID code 2ACF.