key: cord-006934-92ctgc4n authors: Barrett, Alan J.; Rawlings, Neil D. title: Families and clans of cysteine peptidases date: 1996 journal: Perspect Drug Discov Des DOI: 10.1007/bf02174042 sha: doc_id: 6934 cord_uid: 92ctgc4n The known cysteine peptidases have been classified into 35 sequence families. We argue that these have arisen from at least five separate evolutionary origins, each of which is represented by a set of one or more modern-day families, termed a clan. Clan CA is the largest, containing the papain family, C1, and others with the Cys/His catalytic dyad. Clan CB (His/Cys dyad) contains enzymes from RNA viruses that are distantly related to chymotrypsin. The peptidases of clan CC are also from RNA viruses, but have papain-like Cys/His catalytic sites. Clans CD and CE contain only one family each, those of interleukin-1β-converting enzyme and adenovirus L3 proteinase, respectively. A few families cannot yet be assigned to clans. In view of the number of separate origins of enzymes of this type, one should be cautious in generalising about the catalytic mechanisms and other properties of cysteine peptidases as a whole. In contrast, it may be safer to generalise for enzymes within a single family or clan. Peptidases in which the thiol group of a cysteine residue serves as the nucleophile in catalysis are defined as cysteine peptidases. In all the cysteine peptidases discovered so far, the activity depends upon a catalytic dyad, the second member of which is a histidine residue acting as a general base. The majority of cysteine peptidases are endopeptidases, but some act additionally or exclusively as exopeptidases. The archetypal cysteine peptidase is papain, from the latex of the unripe fruit of Carica papaya. Subsequently, cysteine peptidases have been found in nearly every kind of organism, from RNA and DNA viruses to eubacteria, protozoa, fungi, plants and animals. As yet, no archaebacterial cysteine peptidase has been sequenced, but there is evidence that it exists [1, 2] . The first cysteine peptidase for which the amino acid sequence was determined was papain [3] . For some time, it seemed that all cysteine peptidases might be relatives of papain, but as the data have accumulated, it has become clear that the true situation is very different. With about 500 amino acid sequences of cysteine peptidases now available, the papain family is indeed by far the largest, but there are also many other families, each with its distinctive characteristics. The purpose of the present article is to describe these families, and to consider how they may be arranged in larger groups that reflect several separate evolutionary origins of cysteine peptidases. We shall also summarise the biologi-*To whom correspondence should be addressed. Perspectives in Drug Discovery and Design, Vol. 6, pp. 1-11 0928-2866/$ 6.00 + 1.00 © 1996 ESCOM Science Publishers B.V. cal functions of cysteine peptidases in the various families that make many of them potential therapeutic targets. We have compared all the available amino acid sequences of cysteine peptidases by methods described elsewhere [4] . This has led us to identify 35 different families of cysteine peptidases, which we number 'CI', 'C2', and so on (Table 1) . However, we consider it unlikely that there were 35 unique evolutionary origins of cysteine peptidases. It is much more probable that some families have common ancestors, but that these are so ancient that no significant similarities in amino acid sequence survive. We term a set of families distantly related in this way a 'clan' (named 'CA', 'CB', and so on). Since the members of a clan cannot be identified by the statistical analysis of amino acid sequences, other methods are needed. The best method of detecting very distant relationships between proteins, or excluding them, is the comparison of tertiary structures, when these are available. To date, three-dimensional structures have been determined for members of four families of cysteine peptidases. The available structures are those of papain and several other members of family C1 [5] , hepatitis A virus endopeptidase (family C3) [6] , interleukin-ll3-converting enzyme (family C14) [7, 8] and the human adenovirus-2 proteinase (C5) [9] (Fig. 1 ). The structures seen in these four families are so different that it is virtually inconceivable that they had a common ancestor. We therefore take these families to represent four separate evolutionary origins of cysteine peptidases, or four separate clans. As we shall see, however, one of the clans of cysteine peptidases shares ancestors with a group of serine peptidases. Relationships between families may also be revealed or excluded by examination of the amino acid sequences in the most highly conserved regions of the polypeptide chains, those forming the catalytic site. The order of occurrence of the residues of the catalytic dyad in the linear sequence can be crucial, since one can be certain that a peptidase in which the active site residues occur in the order Cys, His cannot adopt the same fold as a peptidase in which the order is His, Cys. Both Cys/His and His/Cys arrangements of catalytic dyad residues occur amongst cysteine peptidases, and these help us to assign the families to clans. The amino acids surrounding the catalytic residues tend to be much more highly conserved than those elsewhere in the sequences and, accordingb; the sequences of these can give further clues to the grouping of families in clans. Clan CA contains the families that had a common origin with papain. These have the Cys/His type of catalytic dyad, and show other similarities in the structure of the catalytic site. In the papain family, Gln 19, located in the active site wall, is important in defining the 'oxyanion hole', and Asn 175 has a role in orientating the imidazolium ring of the catalytic His ls9 [10] . Because of their important functions, these amino acids are absolutely conserved in the family, and also, the residue following the catalytic Cys 25 is invariably one with a bulky, hydrophobic side chain. Of the other families of peptidases known to have the Cys/His catalytic dyad, only those of calpain (family C2) and streptopain (C10) have the glutamine, asparagine (aspartic acid in streptopain) and bulky hydrophobic residues in the expected positions. The sequences forming the environments of the catalytic residues in these families also show other similarities to papain, and we therefore group families C1, C2 and C t 0 in clan CA. With few exceptions, the endopeptidases of clan CA show a strong preference for hydrolysis of bonds in which the P2 residue is a hydrophobic one. These enzymes are also more rapidly inactivated by iodoacetate than by iodoacetamide [11] . This is contrary to the intrinsic reactivities of these compounds with free thiol groups, and is dictated by a charge distribution in the enzyme active site that seems to be peculiar to clan CA. This pattern of reactivity is not seen with clostripain in family C11 [12] or legumain in C13 [13, 14] . The papain family has more known members than any other family of cysteine peptidases. Generally; these are synthesised as preproproteins that acquire activity only after the removal of the N-terminal peptides. Exceptions are the cytoplasmic peptidases, bleomycin hydrolase, which is named for its detoxification of an anticancer drug, and bacterial aminopeptidase C. Bleomycin hydrolase occurs, in yeast at least, as a hexamer with DNAbinding activity [15] . More typical members of the family are the proteinases of the food vacuoles of protozoa, and the lysosomal proteinases, cathepsins B, H, L, K, S and others in animals. As is fully described elsewhere in the present volume, the papain-like proteinases of parasites often play important roles in the destruction of host proteins [16] , in the nutrition of the parasite [17] , and in the neutralisation of the host immune response [18, 19] . In the lysosome, enzymes of this type contribute to the turnover of cellular proteins, and also act in phagocytic cells to digest the proteins of bacteria. This activity contributes antigenic peptides to the major histocompatibility class II system, leading to the production of antibodies [20] . The lysosomal enzymes can also be secreted into the extracellular matrix, where they contribute to tissue remodelling in cartilage and bone [21, 22] , and possibly also in tumour invasion [23] . Several studies have implicated endosomal/lysosomal endopeptidases of family" C1 in the pathology of Alzheimer's disease [24, 25] , and in another form of amyloidosis [26] . Endopeptidases of family C1 occur in just one group of viruses, the baculoviruses. One of these is responsible for the degradation of actin during the liquefaction of the tissues of the insect host [27] . The endopeptidases of the calpain family, C2, are cytoplasmic, calcium-dependent enzymes. Their natural substrates include membrane proteins such as spectrin [28] , and a number of cytoskeletal proteins [29, 30] . Calpain activity is strongly implicated in several forms of neurodegeneration [30] . Calpain is required for the normal muscle development [31], but there is a muscle-specific calpain that is believed to be implicated in a form of muscular dystrophy [32] . Streptopain (family C10) is secreted as a proenzyme by the bacterial pathogen Streptococcus pyogenes. It is a virulence factor, also known as pyrogenic exotoxin B. The enzyme contributes to the tissue injury of adult respiratory distress syndrome [33] . The amino acid sequences of the cysteine peptidases of picornaviruses (picornains 2A and 3C) (family C3) hint at a relationship with chymotrypsin [34] , and when the threedimensional structure of picornain 3C from the human hepatitis A virus was determined, it showed unmistakable similarities to that of chymotrypsin and other members of clan SA [6, 35] . Evidently, catalytic serine and cysteine residues have been interchanged in evolution, although it is not clear which came first. Picornain 3C retains the equivalent of His 57 of chymotrypsin, but has no homologue of Asp 1°2, the third member of the catalytic triad [6] . Because of this distant relationship to chymotrypsin, the picornains and their relatives are commonly (if somewhat confusingly) termed 'chymotrypsin-like cysteine proteases'. There are many other eysteine peptidases with His/Cys dyads in the RNA viruses, and all of these are assigned to families in clan CB. The evolutionary relationships amongst the RNA viruses are obscure, but there are good reasons for thinking that they have common ancestry. One line of evidence for this is the fact that the viral polyproteins contain some homologous proteins, which tend to be arranged in the same order. These include a helicase and an RNA-directed RNA polymerase, and, in the polyprotein of the polio virus, picornain 3C lies between the helicase and the polymerase. The helicase-peptidase-polymerase sequence of genes is also seen with the endopeptidases of tobacco etch virus (C4), feline calicivirus (C24), Southampton virus (C37) and parsnip yellow fleck virus (C38). The endopeptidases of all five families contain the His/Cys dyad, but they are too dissimilar to be placed in one family. The simplest explanation for this situation is that the viral polyproteins are homologous, but that the peptidase domain has a high rate of mutation relative to the helicase and polymerase. On these grounds, families C3, CA, C24, C37 and C38 are assigned to clan CB. Peptidases from family C30 also have a His/Cys dyad and are tentatively included in clan CB. All the peptidases in clan CB are from RNA viruses, and they are responsible for proteotytic processing of the pol polyproteins (the polyproteins containing the RNAdirected RNA polymerase) in which they occur, commonly hydrolysing glutaminyl bonds. The picornains are not confined to action on the viral polyproteins, but also hydrolyse some host proteins. Thus, picornain 2A also cleaves the p220 subunit of the eukaryotic initiation factor 4G into two fragments, one directing helicase activity [36] . Picornain 3C seems to be responsible for the degradation of microtubule-associated protein 4 that is correlated with the collapse of the microtubules in the infected cell [37] . Through their actions on the polyprotein, and probably in other ways, the endopeptidases of clan CB perform vital functions in the replication of the viruses, many of which cause important diseases in animals and plants. This makes the cysteine peptidases of clan CB a target for the design of inhibitors that might have biomedical applications. There are additionally many polyprotein processing endopeptidases from RNA viruses that have the Cys/His catalytic dyad, and also show the bulky hydrophobic amino acid following the catalytic cysteine that is characteristic of the papain clan. Accordingly, these cysteine peptidases are often termed 'papain-like' by the virologists. However, none of them possesses residues equivalent to Gln w and Asn 175 of papain, which implies a different catalytic mechanism and quite possibly a different origin and tertiary structure. These viral peptidases are therefore not included in clan CA. There are no less than 17 families of the 'papain-like' Cys/His viral peptidases, and they are tentatively assigned to a separate clan, CC. There are structural similarities between the polyproteins from turnip yellow mosaic virus (C2t), blueberry scorch virus (C23), apple chlorotic leaf spot virus (C34) and apple stem grooving virus (C35) implying a common ancestry, making it likely that their peptidases are homologous. In processing the viral polyproteins, the endopeptidases of clan CC most commonly hydrolyse glycyl or alanyl bonds. They do not show the specificity for a hydrophobic P2 residue that is seen in the great majority of the endopeptidases of clan CA. Like the viral enzymes of clan CB, they probably also hydrolyse some host proteins. The L-peptidase of foot-and-mouth disease virus is known to cleave the p220 subunit of the eukaryotic initiation factor 4G at -Gly479+Arg -, a requirement for successful viral replication [38] . This clan is so tar represented by only a single family, family C14 of interleukin-l[3converting enzyme. These are cytosolic endopeptidases known only from animals. The three-dimensional structure of interleukin-l[3-converting enzyme shows a distinctive protein fold, and thus a separate origin for this group of cysteine peptidases [7, 8] . Remarkably, the several endopeptidases now known from family C14 all show strict specificity for the cleavage of aspartyl bonds. The interleukin-l[~-converting enzyme itself is synthesised as a proenzyme that is activated by the cleavage of aspartyl bonds, and then cleaves more aspartyl bonds to generate the active form of interleukin 1[~ [39] . Other members of the family are intimately involved in the process of apoptosis (programmed cell death) [40] . Again, clan CE contains just the single family (C5) of the adenovirus endopeptidase L31p23, which possesses a protein fold that is unique [9] , and thus demands a further clan. The adenovirus endopeptidase has two activators: an 11-residue oligopeptide derived from the C-terminus of the structural protein pVI of the virus, and adenoviral DNA [41, 42] . The covalently bound activating oligopeptide is visible in the reported structure and, surprisingly, is not close to the active site [9] . Unlike the majority of the viruses that have been mentioned above, the adenoviruses are DNA viruses. They therefore have no requirement for a polyprotein processing endopeptidase, and the virally encoded enzyme serves other functions essential for infection of the host cell [43] . Families not yet assigned to clans As described above, we have assigned the majority of the known families of cysteine peptidases to clans, but there remain a few that we cannot yet classify. Family C l l contains clostripain, an endopeptidase secreted by the anaerobic, grampositive eubacterium Ctostridium histoIyticum, which shows strict specificity for arginyl bonds. The enzyme is synthesised as a precursor that is activated by the removal Of an Nterminal propeptide and also the separation of two chains, with the loss of a nonapeptide, so that the mature enzyme has a light chain (containing the catalytic cysteine) and a heavy chain [44] . Unlike papain, clostripain is calcium-dependent, is more rapidly inactivated by iodoacetamide than by iodoacetate, and is not irreversibly inhibited by E-64 [12] . Families C12 and C19 contain ubiquitin C-terminal hydrolase 1 and isopeptidase T, respectively. All members of these families seem to catalyse similar reactions, hydrolysing bonds involving the C-terminal carboxyl group of ubiquitin [45] . Ubiquitin molecules are conjugated to intracellular proteins and peptides as signals for rapid degradation by nonlysosomal endopeptidases such as the 26 S proteasome, or to serve a chaperone function in the assembly of oligomeric proteins and ribosomes [46] . Ubiquitin is attached via the carboxyl group of its C-terminal glycine residue either to the N-terminus of another polypeptide or to the e-amino group of a lysine, and in the latter case forms an isopeptide bond. Subsequently, the ubiquitin C-terminal hydrolases remove the ubiquitin moieties for re-use. There are many ubiquitin C-terminal hydrolases, perhaps as a reflection of a multiplicity of functions associated with subtle differences in specificity and function. Ubiquitin chains are removed from other proteins, both following chaperonin-mediated folding, and concomitantly with proteolysis by the 26 S proteasome. Also, polyubiquitin chains are disassembled, and the activated form of ubiquitin, ubiquitin thiol ester, may be hydrolysed [46] . Family C12 contains peptidases with Mr in the range 20-30 kDa, whereas those in family C19 fall in the range 100.-200 kDa. Site-directed mutagenesis has implicated Cys 571 in catalysis by the peptidase that is the product of yeast gene DOA4 [47] , and the presumed catalytic histidine seems certain to be C-terminally located, so that these are Cys/His peptidases. We have previously pointed out that there are slight similarities in the amino acid sequence around candidate cysteine and histidine residues in the two families, which, taken together with the similar activities, tend to suggest that families C12 and C19 belong together in a clan [45] . Family C13 contains peptidases including legumain, all of which show strict specificity for the hydrolysis of asparaginyl bonds. Examples are known from plants and animals. In plants, the endopeptidase is vacuolar, and is implicated in both post-translational processing of seed proteins during maturation [48] , and degradation of the same proteins during germination [49] . The best-known animal member of the family is from the blood fluke Schistosoma mansoni, in which the enzyme is implicated in host haemoglobin digestion, probably by activation of the Schistosoma procathepsin B [50] . This implies location in the lysosome. Peptidases of the tegumain family differ from papain in their weak inhibition by cystatins and E-64 [14] , and their more rapid inactivation by iodoacetamide than by iodoacetate [13, 14] . Family C15 is so far a small one, containing only the eubacterial, cytoplasmic pyroglutamylpeptidase I, an omega peptidase that hydrolyses the N-terminal pyroglutamyl group from a polypeptide. A mammalian enzyme with similar specificity exists, but has not yet been sequenced. Family C25 contains primarily secreted endopeptidases from the anaerobic, gramnegative eubacterium Porphyromonas gingivalis. The most fully characterised member of the family is the arginine-specific gingipain R, but there is also a gingipain K that cleaves lysyl bonds. P gingivalis is a causative agent of gingivitis, and the gingipains have been implicated in several aspects of the disease process [51] . Gingipains R and K are secreted as inactive precursors with N-terminal propeptides. The mature enzymes are multidomain proteins with the endopeptidases at the N-termini and the haemagglutinin domains at the C-termini [52] . Until recently, the cysteine peptidases of the papain family have been given so much more attention than the others that it was easy to assume that all cysteine peptidases were essentially papain-like, but it is now abundantly clear that this is not the case. The threedimensional structures of enzymes from three other families have been determined, and each has proved to have a unique protein fold, clearly unrelated and novel. It is therefore clear that there are a number of distinct evolutionary lines of cysteine peptidases, There is unequivocal evidence of the biomedical importance of the cysteine peptidases. This is seen in the pathophysiology of man and other animals, in the parasites and bacterial pathogens of man, in the physiology and pathology of plants, and in the replication of countless viruses. This demands that the peptidases from each of the clans be characterised with the thoroughness that has so far been accorded only to clan CA, and the work is now well underway. Proc. Natl. Acad. Sci. USA