key: cord-0007091-wzz5t3sv authors: Gorbalenya, Alexander E.; Snijder, Eric J. title: Viral cysteine proteinases date: 1996 journal: Perspect Drug Discov Des DOI: 10.1007/bf02174046 sha: b3a8f18bedad0bc7c198dc960c89b362f1b2587a doc_id: 7091 cord_uid: wzz5t3sv Dozens of novel cysteine proteinases have been identified in positive single-stranded RNA viruses and, for the first time, in large double-stranded DNA viruses. The majority of these proteins are distantly related to papain or chymotrypsin and may be direct descendants of primordial proteolytic enzymes. Virus genome synthesis and expression, virion formation, virion entry into the host cell, as well as cellular architecture and functioning can be under the control of viral cysteine proteinases during infection. RNA virus proteinases mediate their liberation from giant multidomain precursors in which they tend to occupy conserved positions. These proteinases possess a narrow substrate specificity, can cleave in cis and in trans, and may also have additional, nonproteolytic functions. The mechanisms of catalysis, substrate recognition and RNA binding were highlighted by the recent analysis of the three-dimensional structure of the chymotrypsin-like cysteine proteinases of two RNA viruses. " Only proteinases with proven activity are listed. b Virus classification is according to Murphy et al. [34] with some subsequent modifications; HyAV is a double-stranded RNA plasmid related to (+)RNA viruses. c Only one virus, without referring to possible serotypelstrain origin, for each family/group/genus is shown, though proteinases could be identified in a number of related viruses. CPMV: cowpea mosaic virus; CSFV: classic swine fever virus; RuV: rubella virus; TYMV: turnip yellow mos~ic virus; BBScV: blueberry scorch carlavirus; BYV: beet yellow virus. For other abbreviations see the text. d Host range may differ for other viruses of a particular family/group/genus. Plant and some animal viruses may be transmitted to the host by (insect) vectors in which they are able to multiply. The alternative name of the proteinase is in parentheses. The proteolytic activity of nsp2 was demonstrated for EAV rather than PRRSV; this virus also encodes an inactivated copy of nsplc~. PLI proteinase is not conserved in a related coronavirus IBV. The cysteine nature of the CSFV Npro was predicted. f References were limited to reviews if proteinases were characterized before 1990. W h a t unites them is that g e n o m e expression always starts with R N A translation rather than transcription, which is common for other classes of viruses and for cellular organisms. Due to this unique strategy, (+)RNA viruses have developed a number of mechanisms to control genome expression at the translational and post-translational level, and proteolytic processing of polyproteins is one of them. The viral origin of a cysteine proteinase involved in viral polyprotein processing was first established for encephalomyocarditis virus (EMCV) [35] . The proteolytic activity was ascribed to the EMCV 22K protein (now known as the 3C proteinase -3C p~°) encoded in the polymerase region [36] [37] [38] . It was purified from infected cells and was shown to be a cysteine proteinase by an inhibitor analysis [39] . The EMCV 3C pr° was found to be responsible for the maturation of the capsid proteins, which are cleaved in trans, and for the production of nonstructural proteins, which can be proteolytically processed in cis and in trans [40] . After the introduction of genetic engineering methods in RNA virus research, fulllength cDNA clones that can be used to generate infectious transcripts in vitro have been developed for the majority of the (+)RNA virus family prototypes. Numerous RNA viral genomes have been sequenced and analyzed. 3C pr° was shown to be conserved in the polyproteins of picornaviruses and, surprisingly, also in plant como-, nepo-and potyviruses. In these viruses, the enzyme is a member of an array of conserved replicative domains [41] [42] [43] . Proteolytic activity was detected in 3C-containing polypeptides of different picorna-and related viruses. Furthermore, the PV 3CD protein, a precursor of 3C pr° and the 3D RNA polymerase (3DP°~), rather than 3C pr° itself was shown to be the major enzyme involved in processing of the capsid protein precursor P1 [44, 45] . The narrow substrate specificity of 3C pr° became evident by the discovery that nine out of 12 available Q/G dipeptides in the PV potyprotein were cleaved by this enzyme [46] . The 3C p~° of other picornaviruses and the 3C-like (3CL) proteinases of related plant viruses have a similar but more relaxed specificity. Only one conserved cysteine residue, thought to be the catalytic nucleophile, was identified in 3C p~° sequences of different origin [41, 47] and its indispensability for the PV 3C pr° proteolytic activity was proven by site-directed mutagenesis [48] . A local sequence similarity was noticed around the principal catalytic residue of 3C pr° and cellular serine proteases of the chymotrypsin-like (CHL) family rather than cysteine proteases of the papain-like (PL) family [47] . This initial observation was further extended by modeling of the CHL double ~-barrel fold and the prediction of the active site organization in the viral enzymes [49, 50] . The basic features of these models have gained support from a site-directed mutagenesis study of the potyvirus tobacco etch virus (TEV) 3CL or°, Nuclear Inclusion a (NIa) protein [51] . In addition to 3CL pr° enzymes, a number of other thiol proteinases have been identified in (+)RNA viruses. A second proteinase, which is distantly related to 3C pr°, was recognized in the PV replicative protein 2A (2A pr°) [52, 53] . It was shown to be conserved in entero-and rhinoviruses but not in other viruses [49; Table 1 ]. 2A pr° cleaves a pair of Y/G dipeptides in the PV polyprotein in cis and in trans, respectively [46, 53] . This enzyme was also implicated in the proteolytic inactivation of cellular cap-binding factor elF4G (former p220) in infected cells [54] . Proteolytic activity has also been assigned to the C-terminal domain of the nsP2 protein of the animal alphavirus Sindbis virus (SinV) [55, 56] and to the Helper Component (HC) of TEV [57] . Both proteinases were proposed to be distant relatives of cellular PL proteases, nsP2 is the major proteinase of alphaviruses mediating three cleavages in the nonstructural polyprotein (see below). In contrast, the cis-acting HC vr° cleaves only one site consisting of a pair of glycine residues which separate HC and the downstream domain [58] . A number of CHL proteinases related to 3C/2A proteinases, but employing the canonical Ser-His-Asp catalytic triad, were also identified in (+)RNA viruses. This article briefly discusses fundamental and applied aspects of the structure-function organization of viral cysteine proteinases in the light of progress made in the last several years. Viral proteinases have also been reviewed by other researchers [3, 4, 6, 17, [59] [60] [61] [62] . Recently, the first DNA-virus-encoded cysteine proteinases were identified in adenoviruses and baculoviruses. The well-known adenovirus (Ad) 23K capsid proteinase encoded by the L3 gene was demonstrated to be a thiol enzyme [63, 64] . It is a rather unusual proteinase with no apparent relatives among other proteolytic enzymes [62] . The cysteine proteinase v-cath of two baculoviruses, nuclear polyhedrosis virus of Bombyx mori (BmNPV) and Autographia californiea multiple NPV (AcMNPV), shares more than 30% of identical residues with some of the cellular PL proteases [65, 66] . The NPV v-cath is encoded between the 78.7 and 79.4 map units of the genome and consists of 323 amino acids (aa). This proteinase is also conserved in other baculoviruses. Many new cysteine proteinases have been described in non-structural proteins (nsp, nsP or NS) of animal and plant (+)RNA viruses. They are distantly related to either CHL proteinases [10] [11] [12] [14] [15] [16] 18, 67, 68] or PL proteinases [8, 9, 19, 20, 23, [27] [28] [29] [30] [31] 69] . Two weakly related PL proteinases (p29 and p48) have been identified in the N-terminus of the ORFA and ORFB polyproteins, respectively, encoded by the hypovirulence-associated virus (HyAV) [32, 33] . This agent is a virus-like RNA that replicates autonomously and is associated with a reduced virulence of the chestnut blight fungus, Cryphonectria parasitica. It is distantly related to potyviruses [70] . The cysteine proteinases in equine arteritis virus (EAV) nsp2 [21] and hepatitis C virus (HepCV) NS2-3 [24,25; A.E. Gorbalenya, unpublished data] remain to be classified (Table 1) . Furthermore, sequence analyses predicted CHL proteinases in plant bymoviruses [71] , and in the sequiviruses rice tungro spherical virus (RTSV) [72] and parsnip yellow fleck virus (PYFV) [73] . PL proteinases were tentatively identified in hepatitis E virus (HepEV) [74] , plant furoviruses [29, 75] , capilloviruses, and in the related apple stem pitting (ASPV) and apple chlorotic leaf spot viruses (APCLV) [29] . The coronavirus mouse hepatitis virus (MHV) probably encodes a second PL proteinase (PL2 pr°) which is conserved in other members of the group [76, 77] . A duplication of PL pr° was also claimed for the closterovirus citrus tristeza virus (CTV) [78] . All RNA viral cysteine proteinases are synthesized as a domain in a polyprotein [17] and three major trends were discovered with respect to the position occupied by these enzymes. To discuss them, it is relevant to notice that only a limited number of variations in the arrangement of conserved nonstructural domains in the polyproteins of (+)RNA viruses was recognized. Each conserved layout is characteristic of a particular virus supergroup which encompasses a number of virus families/groups [79] . The majority of the CHL cysteine proteinases belong to the 3CL enzymes and occupy the central position in the array of conserved domains N-helicase-CHLPr°-polymerase-C of the Picornavirus-like supergroup, which includes the picorna-, como-, nepo-, sequi-, poty-, and caliciviruses. The coronavirus 3CL vr° and picornavirus 2A pr° do not fit in this constellation. Each of these proteinases is the prototype of a small group of cysteine proteinases which is found only in these virus families (Figs. 1A and C). Two major trends in the positioning of PL proteinases are also evident [80] . In a number of viruses of the Alphavirus-like supergroup, e.g. alpha-, rubi-, tymo-, and carlaviruses and probably some others, PL pr° is associated with a helicase domain within the array Nmethyltransferase-helicase-polymerase-C. The position of PL pr° itself is not conserved and, depending on the virus, it may be located either upstream or downstream of the helicase domain (Fig. 1B) . The rest of the PL proteinases, e.g. those of foot-and-mouth disease virus (FMDV), HyAV, coronaviruses, arteriviruses, closteroviruses, and pestiviruses, and the unclassified nsp2 proteinase of arteriviruses are encoded at or close to the N-terminus of the polyprotein. Some, but not all, of the corona-, arteri-and elosteroviruses encode a tandem of related PL proteinases [20, 76, 78] (Fig. 1C ). After, and sometimes during, polyprotein synthesis, a proteinase always releases itself, with the possible involvement of other factors. A cleavage product containing only the proteolytic domain can be produced, e.g. 2A pr° o r 3C pr°, or a multidomain protein which contains the proteinase can be generated. The latter may be stable, e.g. nsP2, or may be further processed into the mature products, as in the case of e.g. 3CD or NIa. The 3CL and alphavirus nsP2 proteinases are examples of autonomous enzymes that completely control the proper processing of both their termini in cis and/or trans [6, 59] (Fig. 1 ). Other proteinases need 'external' assistance to be released. Part of the N-terminally located PL proteinases is delimited by the N-terminal methionine of the protein and a cognate cleavage site at the C-terminus. One of these proteinases, FMDV L pr°, is produced as a mixture of two forms, Lab and Lb, whose synthesis is initiated at two different in-frame AUG codons, respectively. Both forms of the proteinase appear to be functionally competent, although Lb is the predominant form [81] . Also the synthesis of the PV 3CD or° is under ribosomal control due to downstream termination of translation. A number of PL proteinases rely upon another proteinase to generate their N-termini. The 'assisting' proteinase occupies an upstream position in the polyprotein relative to the 'target' proteinase. The nspl~ and nspt[~ proteins of the arteriviruses lactate dehydrogenase-elevating virus (LDV) and porcine reproductive and respiratory syndrome virus (PRRSV), and nspl and nsp2 of EAV are examples of such pairs [20, 21, 69] . This pattern of proteinase organization and cooperation appears to be popular among (+)RNA viruses. A reversed pattern was observed for 2A v~° expression. The N-terminus of this proteinase is cleaved autoproteolytically, and its C-terminus is released by the downstream located 3C vr° (Figs. 1A and C). A very unusual pattern of expression was described for the HepCV NS2-3 proteinase which encompasses the C-terminal part of NS2, the NS2/NS3 cleavage site and the N-terminal domain of NS3. The NS2-3 proteinase cleaves the NS2/NS3 site and thereby mediates its own inactivation during polyprotein processing [24, 25] . Also mature proteinase molecules may be processed farther. For two closely related 3CL proteinases encoded by TEV [82] and turnip mosaic potyvirus (TuMV) [83, 84] autocatalytic cleavage within their C-terminal region has recently been observed in vitro. The truncated proteinases retained partial activity. The extensive degradation of the EMCV 3C pr° has been described in vivo and in vitro [85] . This is an ATP-dependent process that is probably mediated by a cellular ubiquitin-driven proteolytic system [85,86a] . Probably the turn-over of 3C pr° of other origin may be regulated in the same way [86b]. The cysteine proteinases of two DNA virus families display different expression patterns. The mature AcMNPV v-cath has a size of 27.5K and is produced from 35.5K and 32K precursors in the course of zymogen activation [66] , whereas the Ad 23K proteinase is translated as a mature enzyme [87] . The essential nature of the enzymes When tested, the majority of (+)RNA viral cysteine proteinases appeared to be essential for viral growth. Even when this was not the case, as demonstrated for the FMDV L pr° [88] , proteinase inhibition was always deleterious for virus production. This was because of an indirect effect of proteinase inactivation on the function of other proteins, which were not correctly processed. From the DNA virus enzymes, the Ad 23K proteinase was essential for virion maturation and the abrogation of its activity resulted in the production of immature and noninfectious virus particles [62] . In contrast, the reproduction of the baculovirus BmNPV was essentially independent from the activity of v-cath [65] . Cysteine proteinases are involved in every stage of (+)RNA virus reproduction including three coupled processes: RNA translation, RNA synthesis, and capsid formation. The enzymes can play an indirect role, mediating the production of an active viral factor, or a direct role, being intimately involved in one of these processes. The SinV nsP2 was found to control the switch from negative (-) strand to genomic RNA synthesis. It directs the proteolytic processing of the nsP1234 polyprotein following one of two alternative pathways. This leads to the production of a replicative complex that generates either (+) or (-) RNA [89] . 2A pr° of different entero-and rhinoviruses stimulated the translation of different constructions which was driven by the cognate Y-non-translated region (NTR) of viral RNA [90] [91] [92] [93] [94] . The stimulation did not require the assistance of other viral proteins. A proteolytically competent 2A p'° was also required for wild-type level of the PV RNA replication [95] [96] [97] [98] . This activity of 2A pr° was claimed to be unrelated to its role in polyprotein processing and in trans-activation of translation [95, 98] . Also, another picornavirns proteinase, 3CD pr° (3cPr°), has been implicated in RNA replication. This proteinase seems to be a component of the ribonucleoprotein (RNP) complex involved in (+)RNA synthesis [9%104] . Evidence has also been presented for an additional, as yet unidentified, role of the TEV HC pr° in genome replication [105] . Another, 3CL, proteinase of TEV regulates its cellular localization by cleaving the 6K/NIa bond. The mature form of the proteinase, the NIa protein, is associated with cytoplasmic structures, whereas the 6K-NIa covalent complex is transported to the nucleus [t06]. The Ad2 23K protein may be another proteinase with multiple roles in infection. The 23K-directed proteolytic maturation of the Ad capsid is essential to disrupt cellular membranes at pH 5. Furthermore, only virions containing the proteolyticaUy active 23K protein are competent for cellular entry [107] . The 23K proteinase may influence DNA replication since it mediates the proteolytic conversion of a DNA-binding preterminal protein (pTP) into three products with modified functional characteristics [108] . Cysteine proteinases can modify the infected cell for the purpose of virus reproduction. The translational switch from a cap-dependent to a cap-independent mechanism, which is used by entero-and rhinoviruses, employs 2AV% In aphthoviruses, L vr° seems to play a comparable role. These proteinases cleave cap-binding eIF4G into two parts and, in this manner, they contribute to the inactivation of cellular mRNA translation [109] [110] [111] . 2A pr° may also be responsible for the inhibition of cellular replication and RNA-potymerase (pol) II-dependent transcription which was observed in PV-infected cells. Both processes were reduced profoundly upon the transient expression of 2A pr° in monkey cells [112] . Furthermore, the PV 3C p~° appears to assist 2A p~° in the control of polII-directed transcription and may be responsible for the inhibition of polIII-mediated RNA synthesis. 3C p~° is involved in the modification of polIII Transcription Factor IIIC (TFIIIC) [113] and also cleaves the TATA-Binding Protein (TBP), which is the DNA-binding component of polII Transcription Factor IID (TFIID) [114, 115] . In FMDV-infected cells the inhibition of cellular transcription seems to be mediated by L vr°. This proteinase induces the Nterminal truncation of histone H3 in several test systems, although the cleavage was not reproduced upon incubation of purified substrate with the enzyme [1 t6,117] . In addition to biosynthetic processes, some picornaviruses, including entero-and rhinoviruses, affect cell architecture late in infection. The PV 3C pr° or 3CD p~° may be involved in this process since they can cleave Microtubule-Associated Protein 4 (MAP-4) into products detected in PV-infected cells [118] . DNA-virus-encoded proteinases seem to function late in infection to facilitate virus release and spread. The baculovirus v-oath degrades host tissues [66] and the Ad 23K proteinase was shown to cleave cytokeratin 18 and may also attack other proteins [119] [120] [121] . One of the remarkable features of viral cysteine proteinases is their ability to function in cis, in a monomolecular reaction. All CHL cysteine proteinases and a fraction of the PL proteinases, represented by the alphavirus nsP2, the FMDV L pr° and the MHV PL p~°, are thought to cleave both in cis and in trans. The potyvirus HC vr° and many recently identified PL proteinases, e.g. those encoded by corona-and arterivirus genomes, seem to act exclusively in cis. Several lines of evidence, albeit indirect, for the cis-activity of these proteinases were presented. Cleavage by nsP2 of the SinV nsPl/ns]~2 site, but not of the nsP2/nsP3 junction, was fast and insensitive to dilution [56] . Upon in vitro transla- tion of a number of viral RNAs, some cleavages were so fast that precursor proteins could only be observed upon processing inhibition [ 19, 20, 23, 69, 122] . However, it remained possible that two precursor molecules could be tightly associated and rapidly cleave each other, in an inter-rather than intramolecular fashion. To discriminate between these two possibilities, a system was employed in which a fraction of the molecules carried a defective proteinase and served as substrate for trans-cleavage by the same kind of molecules carrying an active proteinase. When tested in this way, the arterivirus and tymovirus PL proteinases did not show the intermolecutar activity mentioned above, although the cleavage site in the molecule containing the active proteinase was cleaved efficiently [20, 29, 69] . For many other proteinases, including the FMDV L pr° and the PV 2A pr°, which demonstrated intermolecular activity in addition to apparent intramolecular activity, these two types of activities could also be differentiated. The processing mediated by these proteinases in cis and in trans responded differently to probing of the proteinase or the cleavage site structure by point mutations [8,122,123a] . The tertiary structure of the 3C p~° of the picornaviruses human rhinovirus-14 (HRV- 14) and hepatitis A virus (HAV) did not provide firm evidence for its ability to cleave in cis [5, 13] . The N-and C-termini of the molecule were found to be positioned far from the active center (Fig. 2) and could not be aligned with it by simple rotation of terminal amino acid residues. However, in an N-terminally extended precursor of 3C pr°, the Nterminal region of the proteinase could adopt a different structure that would fit the substrate binding pocket and might be processed in cis [5] . Such a reorganization seems impossible for the C-terminus of 3C p~° in the 3CD precursor. From the analysis of intermolecular interactions of eight 3C pr° molecules in the crystal lattice, a model for the interaction of two 3CD molecules was developed which implies that the 3C/3D bond is cleaved in an intermolecular fashion [5] . The initial processing of polyproteins is thought to proceed exclusively in cis. The PV polyprotein was shown to contain two primary sites (2A/2B and 2C/3A) that are recognized by 3C v°. Cleavage of either one of these initiated an alternative proteolytic cascade that may be typical of the early or late phase of PV reproduction, respectively [124] . A similar regulation of polyprotein processing may be used by other viruses [89, 125] . Recently, a number of viral CHL cysteine proteinases of different origin as well as the Ad 23K proteinase became available in significant quantities by purification from different expression systems. They were able to cleave peptides mimicking their authentic cleavage sites in reactions which were quantitatively monitored [4, 61] . The 3C pr° encoded by HRV-14, HAV and cardio Mengo virus (MV), and the 2A pl~ of HRV-2 have a pH optimum in the range from 7.0 to 8.5 and were most effective on peptide substrates at temperatures close to 37 °C [101, [126] [127] [128] [129] . The intramolecular activity of the nepovirus grapevine fanleaf virus (GFLV) 3CL pr° was maximal in the same pH range at 30 °C [15] . When tested using a 16-residue peptide containing its cognate 2C/3A site, the MV 3C p~° appeared to be the most effective of the 3CL enzymes examined, with a Km value of 0.3 mM and Vm~ being 5.9 mkmol/min/mg [129] . The HRV-14 and HAV 3C pr° as well as the TuMV 3CL or° have K m values that are ~7 times higher than that of the MV 3C pr°, although some of them cleaved substrates at a higher rate [83, 101, 126, 130] . The specific activity of the Ad 23K proteinase was 12.9 nmol/min/nmol enzyme [61] . The prototype cleavage sites recognized by the PV 3C pr° and the TEV HC pr° are Q/G and G/G, respectively (see the Introduction). Variations of these sites dominate over others in the natural substrates of the majority of recently characterized proteinases [31, 33, 69, 72, [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] . The residues at either the P1 or the PI' position, or sometimes at both positions, can vary to different extents in substrates of viral proteinases (Fig. 3) . In addition to P1 and PI', other positions around the scissile bond may also be occupied by conserved residues [135, 148, 149] . Natural substrates bearing point mutations and peptide substrates of different structure were used to define the cleavage site determinants for a number of proteinases. The minimal peptide that was cleaved efficiently by the HRV-14 3C pr° consisted of six amino acid residues, the scissile bond being located between the fourth and fifth residues. The optimal substrate for the proteinase was a symmetrical 16- met [150] . To be cleaved in the context of a potyprotein, a potential substrate should be accessible to the proteinase [151] . Peptides corresponding to a variety of natural cleavage sites were cleaved by the HRV-14 and HAV 3C p~° with different efficiencies, which correctly reflected the kinetics of the respective cleavages observed in the respective polyproteins [128, 150] . Substrates containing primary cleavage sites were processed most efficiently by 3C proteinases; the structure of these substrates most resembled that of the prototypical sites. For the different residues surrounding the cleavage sites, a good correlation between conservation and intolerance to substitutions was observed. Elegant genetic experiments with PV confirmed that the principal cleavage site determinants for 3C p~° occupy the P4, P1, and PI' positions of the substrate [152, 153] . In other viruses that employ 3CL proteinases, the substrate specificity is sometimes determined by other positions, which however always include the PI residue [ 12,13a, 150] . In contrast, in substrates recognized by the PV 2A pr°, P2 and PI' are the most important positions [123a,127] . In substrates cleaved by two different PL proteinases, the alphavirus nsP2 and the coronavirus PL1 p~°, P2 and P1 were shown to be determinants of specificity [135, 137, 139] . An interesting observation was made with respect to the primary structure of the only known site cleaved by the HepCV NS2-3 proteinase. Although the structure of this site is highly conserved in the different strains of HepCV, this conservation does not seem to be related to the recognition of this site by the proteinase [147a] . Variations in the structure of different cleavage sites recognized by the same proteinase, along with other factors (see above), probably provide an effective mechanism to direct limited proteolysis of polyproteins in a certain order and at a desired rate. Although optimal for its biological role, the structure of a natural substrate may differ from that of the site which is cleaved most efficiently by its cognate proteinase. This conclusion was drawn from the observation that heterotogous peptide substrates were processed by the HRV-14 and MV 3C vr° with efficiencies that were comparable to or even higher than those observed for the corresponding authentic sites [129, 150] . Cysteine proteinases may interact not only with their substrates for proteolysis but also with a variety of other molecules. These interactions can modulate proteolytic activity or can be essential for nonproteolytic activities of the proteinase, as was demonstrated for the picornavirus 2A vr° and 3C pr° and proteinases of other viruses. The PV 3C vr° probably interacts with 3D p°l in the 3CD precursor. In this 3CD covalent complex, 30 °°1 appears to be an activator of the proteolytic activity in trans of 3C pr° towards the P1 precursor. Furthermore, 3C pr° serves as an inhibitor for the RNA-dependent RNA polymerase activity of 3D p°i [154, 155] . The 3CD precursor was shown to be slowly autoconverted into 3C pr° and 3D p°l and this process was efficiently stimulated by 3AB molecules [156] . It remains to be shown that 3C pr° or 3D v°1 (or both) interact with 3AB. The 3CL proteinases of other animal and plant viruses may be involved in protein-protein interactions similar to those described for the PV 3C pr° [125, 145, 157, 158] . The PV and HRV-14 3CD p~° (3C pr°) proteins also possess RNA-binding properties. The cloverleaf structure, which is conserved within the first 100 nucleotides of the 5'-NTR of the viral RNAs, was recognized by these proteinases [99] [100] [101] [102] . This interaction is crucial for the formation of the RNP complex that is implicated in (+)RNA synthesis (see above). The RNA-binding and proteolytic activities of 3CD pr° or 3C °~° could be differentially inactivated and, therefore, they may be independent [100, 101] . The HRV-2 2A pr° proteolytic activity strictly depends on the presence of Zn 2+, which was found to be tightly bound to the proteinase molecules in an equimolar ratio [159] . The bound Zn 2+ may mediate a novel RNA-or protein-binding activity, which was suggested for 2A p~° [159, 160] because of its involvement in RNA replication [97] and transactivation of translation [92] . The HepCV NS2-3 appears to be another Zn-dependent proteinase: its activity was threefold stimulated by 2 mM Zn 2+ [25] . Also, PL proteinases may possess multiple activities. SinV nsP2 seems to interact with domains of nsP1 and nsP3 which profoundly modulate its trans cleavage activity [161] . In the MHV O R F l a polyprotein, cleavage of the N-terminal p28 by PL1 pr° depended on a long protein spacer separating the target site and the proteinase [19,138] (Fig. 1C) . The LDV/PRRSV nspla proteinase contains conserved sequence motifs which are probably related to an as yet unidentified nonproteolytic activity of this protein [20] . A remarkable mechanism of activation was described for the Ad 23K proteinase. The adenovirus enzyme was shown to cleave peptide substrates only in the presence of a disulfide-linked dimer of an 11-residue peptide which was normally derived from the C-terminus of a proteinase substrate, the capsid protein pVI [162, 163] . This peptide appears to be involved in a thiol-disulfide exchange with the proteinase molecules and this process activates the 23K proteolytic activity. The same activation mechanism probably operates in vivo. The Ad proteinase may also bind to DNA because it was nonspecifically stimulated by" this compound [162] . The HAV and HRV-14 3C pr° adopt variants of the CHL fold which is composed of two [3-barrels, each containing six antiparalM [3-strands [5,13] (Fig. 2) . There are also peculiarities in the design of the 3C pr° structures which were not found in cellular CHL enzymes. The 3C pr° structures of the two viruses do not contain disulfide bridges and their N-terminus is folded in a unique a-helix. In addition, the HRV-14 3C pr° contains a 310 rather than the canonical a-helix at the C-terminus. The HAV and HRV-14 3C pr° are only distantly related (approximately 20% identical residues), and the sequence similarity between many other viral enzymes of this family is even less obvious. Nevertheless, comparative sequence analyses and modeling studies suggest that other 3CL pr° and 2A p~° share the main features of the structural organization of the HRV-14 and HAV 3C pr° [49, 50] . However, the differences between these proteinases may be significant as well. The structural integrity of 2AP~? specifically required Zn 2+ Structural information on other proteinases is rather scarce. A very preliminary model of a PL fold was reported for the FMDV L pr° [8] and the X-ray structure determination of this enzyme is now in progress [164b]. The Ad2 23K, the structure of which is currently being analyzed [165] , was found to be a monomer that does not contain disulfide bonds [120,t63] . (See also Note added in proof.) Catalytic system Viral cysteine proteinases employ different catalytic systems which seem to be original variants of the well-known catalytic triads identified in cellular enzymes. The HRV-14 and HAV 3C p~° contain a triad of residues which are spatially equivalent to the catalytic Set-His-Asp of cellular CHL proteases and which are embedded in similar folds. In both enzymes the catalytic serine is replaced by cysteine and, in addition, in the HRV-14 3C pr° the catalytic aspartic acid has been substituted by glutamic acid [5, 13] . The catalytic Cys 146, His 4°, and Glu 71 of the HRV-14 3C pr° are in reasonable positions and orientations to form a charge-relay system that is similar, but not identical, to that of serine proteases. In particular, Glu 71 adopts an unusual antilone conformation to fit a space occupied by the smaller aspartic acid residue in serine proteases. The oxyanion hole involved in the stabilization of the tetrahedral transition complex is shifted 1-1.2 A away in the cysteine proteinase. Surprisingly, different active site organization was revealed for the HAV 3C p~°. The side chain of Asp 84, which is the counterpart of HRV-14 3C pr° Glu 71, was shown to be oriented away from the catalytic histidine. Thus the HAV 3C vr° likely employs a diad rather than a triad of catalytic residues [13,60] (Fig. 2) . The Cys-His-Asp(Glu) triad together with the glycine residue that follows the catalytic cysteine are the only conserved residues in the primary structures of 2A and 3CL proteinases [166] (Fig. 3) . The involvement of the cysteine in catalysis was supported by the results of inhibitor studies with numerous proteinases. Moreover, the formation of thiohemiacetal between the HAV 3C vr° and a peptide-aldehyde inhibitor was recently demonstrated [167] . The importance of the triad residues for proteolytic activity of the CHL proteinases of different picorna-, como-, nepo-, calici, poty-, and coronaviruses was confirmed by site-directed mutagenesis. Furthermore, some Cys-to-Ser proteinase mutants retained partial [7, 14, 16, 67, 122, 168, 169] or even full [146] activity and acquired sensitivity to serine proteinase inhibitors [168] . The latter supported the genuine character of the relationship between cysteine and serine proteinases of the CHL family. In the same way, it was demonstrated that in a number of proteases glutamic acid and aspartic acid are (partly) interchangeable at the third position in the catalytic triad [7, 16, 101, [169] [170] [171] . The special character of this interchangeability is underscored by the observations that proteolytic activity was much more sensitive to replacement of the catalytic acidic residue by asparagine or glutamine [7, 14, 16, 100, [169] [170] [171] [172] . The only apparent exception was reported for coronavirus 3CL proteinases. A putative active site aspartic acid (glutamic acid) was tolerant to substitution by glutamine in the IBV and by alanine or proline in the MHV 3CL proteinases [67,173a] . In the human coronavirus 229E (HCV) an asparagine is present at this position [77] (Fig. 3) , and replacement of this residue by glycine or proline did not impair proteolytic activity in a peptide cleavage assay (J. Ziebuhr, G. Heusipp and S.G. Siddell, personal communication). These results and those of the James group [13] raise questions on the possible role(s) of the acidic residue in catalysis, and the possibility exists that viral CHL proteinases may differ in this respect. Unlike their CHL counterparts, a number of viral PL proteinases were sensitive to specific inhibitors of papain, e.g. L-trans-epoxysuccinyMeucylamido(4-guanidine) butane (E64) or its derivatives [173b, 174] . In line with this observation, a catalytic Cys-His diad of the PL type was identified in these and other viral PL proteinases by comparative sequence analysis and site-directed mutagenesis. At the primary structure level, the similarity between viral and cellular PL proteases is concentrated mainly around the catalytic cysteine residue; this is also true for the group of viral PL proteinases by itself (Fig. 3) . No apparent counterpart of the papain Ash 175 residue, which is considered to be the third member of the catalytic triad in some models, was recognized in any of the RNA-virus-encoded proteinases. Furthermore, another hallmark of cellular PL proteases -a tryptophane residue following the catalytic cysteine -is substituted in many viral proteinases by other aromatic or aliphatic residues, or even by arginine [23,28--30,33] (Fig. 3) . Unlike the observations made for CHL proteinases (see above), the Cys-to-Ser mutants of all viral PL proteinases tested thus far were completely inactive [9, 19, 20, 26, [28] [29] [30] 33, 69] . This may reflect differences in the catalytic mechanisms employed by the two groups of viral cysteine proteinases. A putative catalytic diad of cysteine and histidine was also identified in the EAV nsp2 [21] , HepCV NS2-3 [24,25; A.E. Gorbalenya, unpublished data] , and Ad 23K proteinases [63, 64] , whose relationship with established proteinases is uncertain (Fig. 3) . Of particular interest is the residual proteolytic activity of the C104S catalytic site mutant of Ad2 and its inactivation by diisopropylfluorophosphate [63] . The proteinases mentioned above contain conserved acidic residues which might function in catalysis, but their role has not yet been elucidated. (See also Note added in proof.) So far, determinants of substrate recognition were only characterized in 3CL proteinases. His 16° and Thr I41 of the HRV-14 3C vr° were found to form the S1 subsite of the substrate-binding pocket and are hydrogen-bonded to the carboxamide group of the conserved glutamine residue in a 6-aa peptide substrate. The latter was modeled on the tertiary structure of the enzyme, a process that was guided by the organization of trypsin and the Bowman-Birk inhibitor complex [5] . The threonine and histidine residues of the substrate-binding pocket are conserved in the 3CL proteinases of most, but not all, viruses [49, 50] . There is a perfect correlation between the presence of the substrate-binding pocket histidine in 3C1 pr° and the ability of the proteinase to cleave after a glutamine or glutamic acid residue (Fig. 3) . Substitution of. this histidine_ residue was absolutely deleterious for any of the 3CL proteinases under any of the experimental conditions tested [7, 48, 100, 104, 141, 142, 175, 176] . A similar observation was made for the nepovirus 3CL proteinases, in which a leucine occupies the position in the substrate-binding pocket that is normally occupied by histidine (Fig. 3) . Reversion of this leucine residue back to histidine inactivated the proteinase [143, 146] . The substrate-binding pocket threonine tolerated a conservative serine substitution fairly well and was sensitive to another substitution [104, 175, 176] . A PV Thr~4Z-to-Ser mutant gave a viable small-plaque progeny [104] . Interestingly, a viral serine rather than a cysteine enzyme with structural and functional characteristics of a 3CL proteinase was recently described for arteriviruses [141] (Figs. 1C and 2 ). In the HRV-14 3C pr°, the residue that could determine its specificity for a small residue in the PI' position was not identified. The $5-$2 subsites of the substrate-binding pocket, which interact with the less conserved P5-P2 positions of the cleavage sites, were recognized in the enzyme-substrate model of the HRV-14 3C pr° as well [5] . 3C proteinases may employ an original mechanism for coupling substrate recognition with catalysis [5] . The binding of a cognate substrate by the proteinase may fuel free binding energy into the stabilization of an unusually flexible loop comprising (among others) the catalytic cysteine and two glycine residues of the oxyanion hole. This energy transfer would promote cleavage of the peptide bond [5] . In a number of proteinases, residues that are likely to be involved in a variety of interactions with molecules other than substrates were identified. The site-directed mutagenesis of two discontinuous regions, enriched in aromatic and basic amino acid residues, implicated them in mediating the binding of 3CD pr° to the cloverleaf structure of the PV and HRV-14 5'-NTR [100] [101] [102] 104] . These regions include a large part of the interdomain and D2^E2 loops, respectively, and are spatially juxtaposed [5] . Among the interacting residues is a prominent tripeptide PheArgAsp in the interdomain loop which is conserved only in picornavirus 3C pr° [50, 177] (Fig. 2) . The presence of this peptide in a proteinase may be indicative of the enzyme's RNA-binding activity. A number of additional amino acid residues from the N-and C-terminal regions of 3C pr° may also contribute to the RNAbinding surface formed on the opposite side of the active site cleft of the molecule [5] . RNA-binding activity was also claimed, but has not been proven yet, for the entero/ rhinovirus 2A pr°. Two sets of residues which might mediate such an activity have already been identified. The first set includes residues which could be changed to generate suppressor mutations for substitutions made in a region of the 5'-NTR. This region lies downstream of the cloverleaf structure and promotes cap-independent translation of viral RNA. The mutated residues are scattered in the polypeptide chain with a possible spatial concentration in two distinct areas far from the active site region of 2A pr° [92] . These areas were implicated in the interaction with this region of the 5'-NTR [92] . The second set is represented by a finger-like motif CXC-Xn-CXH which was implicated in Zn 2+ binding [160,164a] . The spatial organization of the finger may distantly resemble that of the RNAbinding surface of 3CP% There is, however, a difference between the two apparently analogous structures. Unlike the 3C pr° residues involved in RNA binding, the fingerforming residues of 2A pr° cannot be mutated without the loss of proteolytic activity [160] because of their apparent role in maintaining the structural integrity of 2A p~° [164a]. Interestingly, a structurally similar finger seems to be conserved in the HepCV NS2-3 proteinase (A.E. Gorbalenya, unpublished data) and replacements in it diminished the proteotytic activity of this proteinase as well [25] . Site-directed mutagenesis also helped to identify a cysteine residue likely to be involved in the activation of the Ad 23K proteinase. The conserved Cys ~ was implicated in the thiol-disulfide exchange with the activating 11-residue peptide [63, 64] . (See also Note added in proof.) After the identification of a CHL fold in the 3CL viral cysteine proteinases (Fig. 2) , the common ancestry of this group and the canonical CHL serine proteases is no longer a matter of debate. Although these findings do not reveal the direction of protease evolution, independent considerations suggest that the serine version of the CHL protease most likely originated from a cysteine-based ancestor [178,i79] . Hence, the 2A pr° and 3CL cysteine proteinases may have descended directly from the primordial CHL enzyme [79] . The subsequent change of catalytic nucleophile may also have occurred during the evolution of viruses or their predecessors. This can be concluded from the observation that the 3CL proteinases of the related coronaviruses and arteriviruses, which appear to be descendants of the same ancestral proteinase [141] , are cysteine and serine enzymes, respectively (Figs. 1C and 3) . A common origin is also likely for viral and cellular PL proteases, although a lack of structural data leaves this relationship without firm evidence so far. Viral PL proteinases have probably been able to explore 'evolutionary space' at a scale not reached by their cellular counterparts. This could explain their broad diversity, both in size and in primary structure (Fig. 3) . Duplications of a PL proteinase domain may have occurred independently in several viral lineages [20, 76, 78] . A possible product of such a duplication, the arterivirus nsp2 proteinase, combines PL and CHL features in its primary structure (Fig. 3 ) and its origin is intriguing [21] . Viral (cysteine) proteinases are among the most attractive targets for antiviral chemotherapy [180, 181] . These promises are yet to become true, although a variety of compounds can inhibit viral cysteine proteinases in vitro and, in some cases, also in vivo. The anti-papain drug E64 or its lipophilic derivatives inhibited the FMDV L pr° and the MHV PLW ° in vitro and also inhibited virus replication in tissue culture [173b,174] . Elastase inhibitors like methoxysuccinyl-Ala-Ala-Pro-Val-chloromethyl ketone (MPCMK) and elastatinal were effective to some extent against the PV and HRV-14 2A pr° [182] , and the latter also inhibited the HAV 3C pr° in vitro [128] . In addition, MPCMK partly inhibited virus-specific protein synthesis in PV-infected cells [182] . Spiro indolinone [3-tactam was synthesized and shown to be a good inhibitor (ICs0 = 20/ag/ml) of the HRV and PV 3C pr° [183] . A compound named thysanone was isolated from the fungus Thysanophora peniciltoides and inhibited (ICs0 = 13/ag/ml) the HRV 3C pr° [184] . This was also the case using the related polyketide antibiotic katafungin (ICs0= 3.3/aM) and two other substances of microbial origin, phytotoxin radicinin (ICs0 = 500/aM) and citrinin hydrate (ICs0 = 280/aM) [185] . The latter two compounds were selected from 20 000 sources employing an original bacterium-based system specially designed for the large-scale screening of anti-proteinase drugs [185] . Recently, two groups reported on HAV and HRV 3C pr° inhibitors which mimic cognate peptide substrates and contain a C-terminal glutamine residue that is modified with a chemically active group [167, 186] . Another inhibitor of the HAV 3C p~°, peptide-aldehyde acetyl-Leu-Ala-Ala- (N,N'-dimethylglutaminal) , was 50-fold more active against this enzyme compared to the related HRV-14 3C p~° and had a I~. of4.2x 10 -~ M [167] . Similarly, P4-P1 oligopeptides Leu-Ala-Gly-Gly with a C-terminal dimethylacetal or nitrile group were good substrate-resembling inhibitors of the Ad2 23K proteinase [187] . The number of established viral cysteine proteinases has been expanded enormously and now includes both RNA-and DNA-encoded enzymes. All (+)RNA viruses of verte-brates and a significant number of plant RNA viruses employ cysteine proteinases or their serine homologues to control virus-specific processes and to modify the host cell. Proteinases can act in cis and/or trans and may display additional, nonproteolytic protein-and RNA-binding activities. The inactivation of proteinases is deleterious for virus reproduction and this makes these enzymes attractive targets for the therapy of many hazardous virus infections. The coding sequences for a large number of proteinases have been cloned, sequenced, and analyzed, and a few picornavirus enzymes were characterized in more detail. Extensive variations of only two structural themes, characteristic of the cellular enzymes chymotrypsin and papain, were identified in viral cysteine proteinases. This structural conservation, together with the narrow and conserved specificity of viral proteinases, will allow the development of general approaches to design anti-proteinase drugs to be used against a variety of viruses. Also, substances which interfere with nonproteolytic activities of viral proteinases could be developed. To accomplish these goals, the structure function relationships of viral proteinases (and their precursors) and the proteolytically active complexes which they form should be analyzed in detail for prototypes of all major groups of viruses. Proc. Natl. Acad. Sci. USA Proc. Natl. Acad. Sci. USA Proc. Natl. Acad. Sci. USA Abstracts of the IXth Meeting of the European Study Group on the Molecular Biology of Picornavirnses, D5 Proc. Natl. Acad. Sci. USA Virus Taxonomy. Classification and Nomenclature of Viruses, 6th Report of the International Committee on Taxonomy of Viruses Proc. Natl. Acad. Sci. USA Proc. Natl. Acad. Sci. USA Proc. Natl. Acad. Sci. USA Proc. Natl. Acad. Sci. USA Molecular Basis of Virus Evolution 66 (1992) 3330. b. Kraulis, RJ Proc. Natl. Acad, Sci. USA Abstracts of the IXth Meeting of the European Study Group on the Molecular Biology of Picornaviruses Proc. Natl. Acad. ScL USA Abstracts of the IVth International Symposium on Positive Strand RNA Viruses We are grateful to all colleagues who supplied us with reprints of their papers used tbr this survey. Our special thanks to Ernst Bergmann, Xuemei Chao, Mark Denison, Michael James, Kathie Kean, Ann Palmenberg, Bert Semler, Susan Weiss, Eckard Wimmer, and John Ziebuhr who made unpublished results available to us. This work was supported in part by a grant from the RFBR to A.E.G. Very recently, the three-dimensional structure of the complex between the Ad 23K proteinase and its activating peptide was solved by X-ray analysis [188] . A unique fold for this enzyme was revealed. The activating peptide residing far from the active site extends a [3-sheet of the enzyme core. In contradiction to the current model [63, 64] , which has been described in this review, Cys TM has been implicated in the thiol-disulphide exchange and activation, and Cys 122, in addition to His 54 and Gill 71, has been proposed to be part of the catalytic triad. This triad can be superimposed on the active site Cys-His-Asn residues of papain. The tertiary structure model of Ad 23K suggests that variants of a portion of the activating peptide may be useful for designing antivirus drugs.