key: cord-0915745-jrk8yexx
authors: Kasperkiewicz, Paulina; Poreba, Marcin; Groborz, Katarzyna; Drag, Marcin
title: Emerging challenges in the design of selective substrates, inhibitors and activity‐based probes for indistinguishable proteases
date: 2017-01-29
journal: FEBS J
DOI: 10.1111/febs.14001
sha: 8c279f91ec5ec6cbda1cb4242ce8023b77abe326
doc_id: 915745
cord_uid: jrk8yexx

Proteases are enzymes that hydrolyze the peptide bond of peptide substrates and proteins. Despite significant progress in recent years, one of the greatest challenges in the design and testing of substrates, inhibitors and activity‐based probes for proteolytic enzymes is achieving specificity toward only one enzyme. This specificity is particularly important if the enzyme is present with other enzymes with a similar catalytic mechanism and substrate specificity but completely different functionality. The cross‐reactivity of substrates, inhibitors and activity‐based probes with other enzymes can significantly impair or even prevent investigations of a target protease. In this review, we describe important concepts and the latest challenges, focusing mainly on peptide‐based substrate specificity techniques used to distinguish individual enzymes within major protease families.

Proteases are enzymes that hydrolyze peptide bonds within endogenous substrates and peptides. They play a key role in regulating many physiological conditions, and protease activity is dysregulated in many diseases, including cancer, diabetes and neurological disorders [1, 2] . To date, five major families of Abbreviations AAT, a-1-antitrypsin; Abu, aminobutyric acid; ABP, activity-based probe; ABZ, aminobenzoic acid; ACC, 7-amino-4-carbamoylmethylcoumarin; ACT, antichymotrypsin; AMC, 7-amino-4-methyl-coumarin; AOMK, acyloxymethylketone; CatG, cathepsin G; CLiPS, cellular libraries of peptide substrates; CoSeSuL, counter selection substrate library; DUB, deubiquitylating enzyme; Gr, granzyme; hK, human kallikrein; hLeu, homoleucine; HyCoSuL, hybrid combinatorial substrate library; IQF, internally quenched fluorescent substrate; LtA4h, leukotriene A4 hydrolase; MALT1, mucosa-associated lymphoid tissue lymphoma translocation protein 1; Met(O) 2 , methionine sulfone; MMP, matrix metalloproteinase; NE, neutrophil elastase; Nle(O-Bzl), 2-amino-6-benzyloxyhexanoic acid; Nle, norleucine; NSP4, neutrophil serine proteinase 4; NSP, neutrophil serine protease; Oic, octahydroindole-2-carboxylic acid; Pal, pyridyl-L-alanine; PR3, proteinase 3; PSA, prostate-specific antigen; PS-SCL, positional scanning synthetic combinatorial library; SAR, structure activity relationship; SAS, substrate activity screening; SENP, human desumoylating enzyme; Tic, 1,2,3,4-tetrahydroisoquinoline-3-carboxylic acid; UCH, ubiquitin C-terminal hydrolase.

proteases have been described: serine, cysteine, metallo-, aspartyl and threonine proteases (Fig. 1) . The most abundant and explored families are the serine and cysteine proteases, named after the reactive nucleophilic groups in their active sitesthe hydroxyl group in serine and thiol group in cysteine [3] . Mechanistically, proteases hydrolyze peptide bonds in the substrate (endopeptidases) or at the N or C termini (exopeptidases) [3] . Proteinases belonging to the same family, such as caspases, neutrophil serine proteases, aminopeptidases, cathepsins or kallikreins, typically have similar functions and process the same naturally occurring substrates. However, little is known about the individual physiological and pathophysiological functions of some proteases [4] [5] [6] [7] . To answer these questions, chemical tools including substrates, inhibitors and activity-based probes (ABPs) have become essential for monitoring changes in the activity of proteolytic enzymes in cells or even in whole organisms ( Fig. 2 ) [8] [9] [10] .

Despite the significant progress in recent years, one of the greatest challenges in the design and testing of substrates, inhibitors and ABPs is achieving specificity toward only one enzyme [1] . This specificity is particularly important if the enzyme is in the presence of other enzymes with a similar catalytic mechanism but completely different functions. This cross-reactivity of substrates, inhibitors and ABPs with other enzymes significantly impairs or even prevents investigations of a target protease [4] . The specificity of the substrates, inhibitors and ABPs for proteolytic enzymes is optimized by selecting the appropriate amino acid sequences that interact with the binding pockets of the protease [8, 11] .

Many different approaches have been proposed and used to investigate substrate specificity, including positional scanning synthetic combinatorial libraries (PS-SCL), phage display, hybrid combinatorial substrate libraries (HyCoSuL), counter selection substrate libraries (CoSeSuL), internally quenched fluorescent substrate (IQF) libraries (also called fluorescence resonance energy transfer libraries) and proteomics, with varying results (Fig. 3) [8, [11] [12] [13] [14] [15] .

The first broad chemical approach to study protease substrate specificity was proposed by Thornberry and colleagues in 1997 [16, 17] . In this concept, the PS-SCL (Fig. 3) , the substrate library comprises three 'sublibraries' to study protease preferences at the P4, P3 and P2 positions. For instance, in the P4 sublibrary, this position was fixed with one of the natural amino acids (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) , the P3-P2 positions were randomized with equimolar mixtures of natural amino acids (making this library useful to study various caspases), and the P1-P1 0 positions were occupied by Asp-7-amino-4methyl-coumarin (AMC). The caspase specificity preferences in the active site pockets were obtained by screening each position separately, making this approach 'positional profiling'. Shortly after, the Ellman and Craik groups utilized a bifunctional 7-amino-4-carbamoylmethylcoumarin (ACC) fluorophore as a reporter group in PS-SCL, making introducing various amino acids at the P1 position much more effortless [18] . To date, over 100 proteases have been profiled with the PS-SCL approach, ranking this technology among the most useful tools in protease substrate specificity investigations [8] . However, traditional PS-SCL has been demonstrated to be insufficient when analyzing proteases with overlapping substrate specificity profiles (e.g. caspases, cysteine cathepsins, neutrophil proteases and deubiquitylating enzymes (DUBs)). The use of only natural amino acids in a PS-SCL reduces the chance of finding a specific (and/or very active) peptide toward the protease of interest. This drawback was overcome by a novel approach developed in the Drag laboratory [19] . This concept, the HyCoSuL, relies on the use of a broad panel of unnatural amino acids in the substrate sequences, which explore the protease active site much more accurately than a set of 20 natural analogues alone (Fig. 3) . The use of unnatural amino acids in fluorogenic substrate libraries was also applied in the CoSeSuL [12] . This approach is restricted to proteases that display very broad substrate specificity in one (or more) active site pockets. By comparing such a specificity matrix with other proteases displaying narrower specificity in corresponding pocket(s), it becomes possible to distinguish between enzymes. PS-SCL, HyCo-SuL and CoSeSuL are valuable tools to study protease substrate specificities; however, they are all limited to the non-prime pockets of the enzyme active site. IQF substrates offer an excellent platform to study protease preferences in both prime and non-prime regions (Fig. 4) [20] . In brief, IQF substrates are peptides of various lengths flanked by a fluorescence group (donor) and a quencher group (acceptor). Such intact peptides are in a quenched state, as the fluorescence from the donor is quenched by an appropriate acceptor. Once the peptide is cleaved by the protease, the peptide fragment with the fluorophore is separated from the quencher, and the fluorescence is liberated. Currently, there is no single, unified technology using IQF substrates in protease screenings, and the architecture of an IQF substrate library is tailored to the protease being investigated. To date, various donoracceptor pairs have been described in the literature and are reviewed elsewhere [8] .

As an alternative to chemical-based approaches, a phage display technology of biological origins has also been extensively developed and studied on proteolytic enzymes. This methodology was pioneered by Smith, who expressed a diverse pool of peptides on the surface of bacteriophage M13 [21] . This technique was initially oriented to studying protein-protein interactions; however, it was later modified and adapted for protease substrate specificity profiling (Fig. 3 ) [22] . The main advantage of phage display is the generation of a very large and diverse pool of substrates (up to 10 10 individual peptides), which would be very challenging through chemical synthesis. Another important feature of this technique is that these peptides are subjected to protease analysis after each round of expression/selection; thus, in subsequent cycles, peptides of increasing quality are generated. One of the main drawbacks of these methods is that the peptides are label-free. Therefore, the kinetic analysis of the best substrates requires these peptides to be resynthesized with an appropriate reporter group. Since Matthews and Wells first described the use of phage display in protease screening, multiple enzymes have been profiled with this technique [23] .

In this review, we describe important concepts and the latest challenges, focusing mainly on peptide-based 

Chymotrypsin-like neutrophil serine proteases from primary granules Neutrophil serine proteases (NSPs) of primary granules are some of the most abundant serine proteases present in almost all organisms. The NSP family includes neutrophil elastase (NE), cathepsin G (CatG), proteinase 3 (PR3) and neutrophil serine proteinase 4 (NSP4) [24] [25] [26] . Among others, these enzymes are involved in killing bacteria via the formation of neutrophil extracellular traps [27, 28] . NSPs evolved from a common ancestor by gene duplication, and they reveal a new branch of the chymotrypsinogen superfamily of serine proteases [29] . NSPs consist of five exons and four introns. Along with the genes for NE and PR3, the NSP4 gene lies on chromosome 19p13.3 outside of the ZU1-PRTN3-ELANE-CFD cluster [30] [31] [32] , while CatG is located on chromosome 14q11.2 [33] . Regarding primary structure, the amino acid residues that determine the substrate recognition sites for NSPs are quite similar, and catalytic residues and substrate-binding pockets are located in a crevice between two b-barrels [32] . Therefore, these enzymes share similar catalytic preferences and recognize the same endogenous inhibitors belonging to the serpin (serine protease inhibitors) family, such as antitrypsin or a-1 proteinase inhibitor and human monocyte/neutrophil elastase inhibitor, which inhibit PR3, NE and CatG [34, 35] .

Chemically, NSPs can be divided into groups: NE and PR3 recognize small aliphatic amino acid residues (Val and Ala) at the P1 position, CatG recognizes bulky, hydrophobic, aliphatic or aromatic amino acids (Phe, Leu and sometimes Arg) and NSP4 has restricted P1-specificity for Arg [26, 36] . NE and PR3 also have overlapping primary substrate specificity at P4-P2; therefore a specific substrate, inhibitor or activity-based probe was not reported for these enzymes for many years. This specificity issue was confirmed in an elegant approach toward the global identification of neutrophil protease specificity that was recently described by O'Donoghue et al. [37] . In this method, a rapid multiplex substrate profiling analysis using liquid chromatography-tandem mass spectrometry sequencing was applied to profile the substrates of neutrophil serine proteases in human neutrophil extracellular traps. The obtained data clearly showed that each enzyme had overlapping, yet distinct, endopeptidase activities, suggesting overlapping functions and substrate preferences between neutrophil serine proteases [24] .

Recently, the Lesner group reported the design and synthesis of an internally quenched (IQF) PR3 substrate with the structure ABZ-Tyr-Tyr-Abu-Asn-Glu-Pro-Tyr (3-NO 2 )-NH 2 and an enzyme catalytic efficiency (k cat / K m ) of 1534 9 10 3 M À1 Ás À1 . The k cat /K m value for NE was not calculated using this substrate [38] .

In 2014, Drag and colleagues enriched the classic PS-SCL with a broad range of unnatural amino acids in an approach known as the HyCoSuL [19] (Fig. 3 ). This technology allows researchers to explore the P4-P2 binding sites in various structures. Surprisingly, in contrast to previously known NE substrates with sequence Ac-AAPV-AMC (Fig. 5) , the HyCoSuL P4 screen identified the bulky 2-amino-6-benzyloxyhexanoic acid (Nle(O-Bzl)) residue. This result was confirmed later by Lechtenberg, who presented a crystal structure of an NE ABP containing Nle(O-Bzl) at the P4 position [39] and demonstrated the fit of unnatural amino acids in the enzyme pockets in the active site. Moreover, methionine sulfone (Met(O) 2 ) was identified at the S3 site, which suggested a potential posttranslational modification of natural NE substrates. Not surprisingly, both NE (octahydroindole-2-carboxylic acid, Oic) and PR3 prefer proline derivatives at the P2 position [19] . Crucially, a comparison of the k cat /K m values for the synthesized substrate against both NE and PR3 showed that the Ac-Nle(O-Bzl)-Met(O) 2 -Oic-Abu-ACC (Fig. 5) substrate is approximately 900-fold more specific for NE compared with PR3, exhibiting a relatively small K m of 0.28 lM for NE. An ABP for neutrophil elastase based on the optimal NE sequence was then used in in vitro studies, allowing researchers to discriminate between NE and PR3 and monitor the activity of neutrophil elastase in neutrophil traps for the first time [19] .

CatG bears dual P1 chymotrypsin-and trypsin-like substrate specificity [36] . Due to the shape of the S1 pocket, CatG hydrolyzes the peptide bond after Phe and, with less potency, after Arg; therefore, this enzyme might cross-react with NSP4 substrates. In 2015, Kasperkiewicz et al. used the HyCoSuL technology to search for NSP4-specific molecules and investigate NSP4 primary specificity. Using this technology, specific NSP4 substrates and ABPs were synthesized and tested in in vitro assays. NSP4 recognized large hydrophobic residues at the P4 position (homocyclohexylalanine), large basic residues at the P3 (4-guanidynophenylalanine) and the proline derivative Oic at the P2 position (Fig. 5) . The most successful substrate was potent (k cat / K m 32 000 M À1 Ás À1 ) and approximately 70-fold more specific for NSP4 compared with CatG. This discovery prompted the synthesis of ABPs, and resulted in the first covalently binding NSP4-specific ABP, which exhibited approximately 1000-fold more specific binding to NSP4 compared with CatG [40] .

NSPs are involved in many diseases, particularly diseases associated with the lungs; thus, they are considered valuable drug targets. Moreover, a few drug trials have been conducted to discriminate serine proteases using inhibitors. Although a few of these inhibitors have been used in clinical trials [41] , to date, only one compound, sivelestat (ONO-5046, Ono Pharmaceuticals), was approved for use as a drug [42] . This competitive inhibitor is specific for NE over trypsin, thrombin, chymotrypsin, CatG and plasmin with a relatively small half-maximal inhibitory concentration (IC 50 ) of 44 nM and K i of 0.2 lM toward NE.

The best-known in vitro serine protease inhibitors are peptidyl derivatives of a-aminoalkylphosphonate diphenyl esters, covalent inhibitors that bind to the hydroxyl group of serine in the active site [43] . These compounds mimic natural amino acids, and they are some of the most frequently used NE inhibitors in in vitro studies. Oleksyszyn and colleagues modified the esters in this inhibitor class and subsequently found that an S-methylsulfodiphenyl di-ester was more potent than the unmodified ester [44, 45] . Unfortunately, this approach was not sufficient to distinguish NE and PR3. Recently, Guarino et al. designed di(chlorophenyl) phosphonate esters characterized by an unexpected structure containing Asp at the P2 position, instead of Pro, which was described as the best-fitting amino acid in this pocket. They used a probe with PEG 66 [aminoomega-carboxy poly (ethylene glycol)] as a linker and biotin as a tag characterized by a relatively small second-order inhibition constant (k obs /[I]) against proteinase 3, which showed weak binding with NE, as a probe to detect PR3 in permeabilized neutrophils. The authors suggested that this probe may be used to monitor, detect and control PR3 activity in inflammatory diseases; however, no control was performed [46] .

Human granzymes (Gr) were named from 'granule enzymes' in 1987 by Masson and Tschopp [47] . They are closely related proteases with a nucleophilic serine residue stabilized by histidine and aspartic acid at the enzyme's active site and are expressed in cytotoxic T lymphocytes and natural killer (NK) cells. Those proteases are packed into cytoplasmic granules, and similarly to NSPs, they are secreted during stimulation. In humans, five Grs are distinguished (A, B, H, K and M) [48] . However, granzyme genes were found on three chromosomes in three clusters (GrA, GrB and GrM clusters). The GrA cluster on chromosome 5 includes GrA [48, 49] and GrK; the GrB cluster on chromosome 14 includes GrB and GrH [50] ; and the GrM cluster on chromosome 19 includes GrM [51] . Grs share distinctive structural features, such as the presence of a signal peptide and pro-dipeptide, which are removed during activation. Although there are subtle differences in the crystal structures of Grs, their detection remains challenging because of the similarities between them. For instance, anti-GrB binds both GrA and GrH. This problem is magnified since it was shown that Grs recognize and cleave the same natural substrates; for instance, 80 GrA substrates are reported to be GrB substrates as well, and among them, 22 have the same cleavage site [52] . To date, no specific tool to distinguish between active and inactive Grs has been found; however, a few elegant approaches have been published and are discussed in this review.

GrA is the most abundant of all the granzymes [53] . The effects of GrA and K on caspase-independent cell death remain unclear [54] ; thus, it is necessary to distinguish between these enzymes to understand their individual roles. GrA and K primarily act in inflammation by promoting the secretion of cytokines, interleukins and tumor necrosis factor, and these enzymes recognize the same natural substrates [54] [55] [56] . The extended substrate specificity for these enzymes is almost identical. These enzymes recognize Tyr, Asp and Val at the P3 position, Phe and Pro at the P2 position, and Asp and Met at the P4 position. GrK also recognizes Leu, Phe and Trp at the P4 position and prefers Glu and Gln at P3. These results reveal substantial differences between GrA and K, particularly at the S3 and S4 subsites. Few differences between the substrates cleaved by these enzymes in vitro have been identified, and GrA and K might have different physiological functions [56] . Van Damme et al. [52] also explored and identified GrA substrate preferences in 2010 using a proteomic approach, with similar results. The utility of proteomics in granzyme specificity determination is reviewed elsewhere [15] .

Granzyme B (GrB) is the only known serine protease with restricted specificity for Asp at the P1 position [57] . This feature is common in caspases, which belong to the cysteine protease family. GrB and caspases have similar key functions, including the induction of apoptosis, although they belong to different protease families with different evolutionary origins [57] . In 1998, Harris and colleagues analyzed the extended substrate specificity of GrB using filamentous (M13) phage display and explored the P3, P1 0 and P2 0 positions with an I-P3-P-D-P1 0 -P2 0 library with only one defined position; the other sites were fixed. Based on the results, GrB recognizes Glu at the P3 position and Gly at the P2 0 position [58] .

Unlike GrA and K, GrM and H have different preferences at the P1 position, thereby facilitating their differentiation. Several methods have been used to study GrM and H, including positional scanning libraries of coumarin substrates as well as screens of individual pnitroanilide and coumarin substrates [59] . GrM hydrolyzes the peptide bond after the aliphatic amino acids Ile and Met and does not hydrolyze the bond after Phe. GrH also prefers hydrophobic amino acids at the P1 position, including Phe, but does not hydrolyze the peptide bond after Met [59] [60] [61] . Powers and colleagues used a series of peptide thiobenzyl esters to determine the primary specificity of GrM, which exhibited a preference for Pro at P2 and Ala, Ser and Asp at the P3 position [62] . Using P1-Leu and P1-Met PS-SCL, Mahrus et al. [59] explored the P2-P4 subsites and showed that the substrate specificity profiles of the two libraries were similar at the P4 and P2 positions, but were slightly different at the P3 position, which might be an effect of subsite cooperativity. Further studies on the extended P4-P4 0 specificity were also conducted [63] .

Mahrus et al. [64] used the PS-SCL approach to analyze GrM, K, H, B and A, and they identified differences in substrate preferences, allowing further investigations of specific inhibitors and ABPs.

In summary, a few peptide sequences for granzymes have been identified using PS-SCL, phage display and proteomics, which will aid future research into discriminating their individual physiological functions. The Van Damme group performed hierarchical clustering of the surface surrounding the granzyme active sites using proteomics, which confirmed observations based on chemical methods, that the S1 pocket is the most restrictive and binds Asp for GrB, Arg for GrA and K, Met/Leu for GrM and Phe for GrH, while the S2-S4 pockets bind a broad range of amino acid residues [15] , due to subtle differences in the primary specificity profiles. However, more research is necessary to distinguish among the granzymes, particularly GrA and K.

The human tissue kallikrein family comprises at least 15 serine proteases (hK1-hK15), which are encoded by the largest protease gene cluster and are expressed in almost every tissue, including brain, breast and skin, among others [65] . These enzymes possess at least 30% sequence identity, a similar molecular size (25-30 kDa) and (chymo)trypsin specificities [7] .

Debela et al. determined the extended substrate specificity of seven human tissue kallikreins, hK3/prostate-specific antigen (PSA), hK4, hK5, hK6, hK7, hK10 and hK11, using a combinatorial peptide library. Based on their data, hK3 and hK7 show chymotrypsin-trypsin-like activity and hydrolyze peptide bonds after large hydrophobic (Tyr), aliphatic (Ala and Nle) or polar amino acids (Arg and Lys). The kallikreins hK4, hK5 and hK6 exhibit trypsin-like specificity, with a strong preference for P1-Arg, whereas hK10 and hK11 display dual specificity and hydrolyze P1 peptide bonds after both aliphatic (Nle, Met, and Ala) and basic amino acid residues (Lys and Arg). All investigated enzymes demonstrate broad specificity at the P4 and P3 subsites, but hK4 does not recognize Pro and hK5 does not recognize Asp at P3. All investigated kallikreins demonstrate subtle differences at the P2 and at the P1 positions, but hK3/hK7 and hK10/ hK12 have similar substrate specificities [7] . The PS-SCL results agree with the structural data, and therefore might be helpful in future research. However, specific substrates for selected kallikreins have not been synthesized and characterized using these promising specificity profiles. Thus, subsite cooperativity has not been excluded and additional validation is necessary. Nevertheless, the presented data are valuable preliminary results.

PSA and hK2 display 79% sequence identity, which hinders the development of specific substrate, inhibitors and antibodies for these enzymes. In 2000, Wu et al. used a cyclic phage display peptide library to identify peptides specific for active PSA over other serine proteases, including chymotrypsin, CatG, kallikreins, trypsin and hK2 [66] . The identified sequence binds to the active site of PSA, allowing us to distinguish between active and inactive PSA, in contrast to antibody labeling, which binds both forms. Wu and colleagues used this discovery to developed new immunopeptidometric assays with unique specificities [67] .

In addition, Hekim et al. used phage display to identify the specific sequence of a peptide inhibitor of hK2. In this approach, 10-and 11-amino-acid libraries were used and peptides binding to hK2 were isolated. A specific cyclic peptide substrate with no inhibitory activity and a linear peptide inhibitor (RFKXWW or ARRPXP) with no overlapping specificity toward PSA, chymotrypsin and various trypsin isoenzymes were obtained. These peptides differentiate between PSA and hK2, which was not previously possible [68] .

In another approach, de Veer et al. [69] used a sparse matrix library consisting of 125 defined peptides to explore hK5 substrate specificity. GRSR, YRSR and GRNR were reported as the most active sequences, and an inhibitor was subsequently engineered by substituting the substrate sequences in the binding loop (P1, P2 and P4 residues) of sunflower trypsin inhibitor-1. However, despite their high activity, the designed inhibitors lacked selectivity. The authors then substituted the P2 0 residue, which generated a specific hK5 inhibitor with the kinetic constant K i = 4.2 AE 0.2 nM. The inhibitor displayed low activity toward hK7 and 12-fold selectivity over hK14 [69] .

One of the most interesting and successful approaches for investigating specific molecules for kallikreins was described by Cloutier et al. [70] and Felber et al. [71] . Both groups used the same strategy to modify serpins (endogenous inhibitors of serine proteases), the reactive center loop of which is crucial for inhibiting serine proteases. The amino acids near the scissile bond of the reactive center loop were modified in another approach by extracting structural fragments to obtain specific recombinant hK2 [70] and hK14 [71] inhibitors. In these approaches, phage display methodology was used to select peptide sequences for hK2 (six variants) and hK14 (two variants). hK14 has a dual (chymo)trypsin-like substrate specificity, and Felber et al. described four variants of recombinant mutant serpins (two variants for serpin): a-1-antitrypsin (AAT) and antichymotrypsin (ACT). In both ACT and AAT, five residues surrounding the scissile bond were replaced with a pentapeptide sequence that was previously obtained using phage display [72] . Specificity was tested against enzymes with a broad range of specificity, such as elastase, chymotrypsin, trypsin, plasma kallikreins and thrombin, and then tested against enzymes belonging to the same family, i.e. hK2, hK3, hK5, hK6, hK8 and hK13. One of the two variants of ACT and AAT displayed selectivity for hK14 (with the exception of chymotrypsin). All the variants specifically inhibited hK14, but two ACTderived inhibitors formed more stable complexes with hK14, in contrast to the other AAT inhibitors [71] . For hK2, sequences were inserted into ACT and the resulting inhibitors were screened against human neutrophil elastase, as well as proteases with chymotrypsin-like (chymotrypsin, PSA) and trypsin-like (hK2, hK1, plasma kallikrein, urokinase-type plasminogen activator) activity; two variants of modified ACT retained specific activity against hK2 [70] . Using this clever mixed strategymodification of reactive center loop of serpins with different peptide sequences the authors were able to distinguish between hK14 or hK2 and a broad range of serine proteases, with the exception of chymotrypsin.

Caspases are cysteine proteases that display very narrow preferences at the P1 position, primarily recognizing aspartic acid (with some exceptions) [73] . Such primary specificity is quite rare among proteases. However, Wells and colleagues have recently shown that these enzymes can also cleave substrates after glutamic acid and phosphoserine residues, shedding new light on their physiological functions [74] . Since the discovery and characterization of caspases 20 years ago, hundreds of papers have been published on their substrates and inhibitors [9, 75] . Unfortunately, their exclusive specificity for Asp at the P1 position makes it very difficult to distinguish these enzymes [76] . Using PS-SCL as a tool, Thornberry and colleagues determined the substrate preferences of the S4-S2 pockets of nine human caspases [17] . Based on these profiles, caspases were divided into three groups: caspase-1, -4 and -5, which prefer the (W/L)EHD sequence, caspase-3, -7 and -2, which recognize DE(V/ H)D peptides, and caspase-6, -8 and -9, which favor (V/L)EHD motifs. Nevertheless, the specificity matrix clearly shows that these proteases display overlapping substrate specificity, and the 'optimal sequences' proposed by the authors may not be specific for only one enzyme. Indeed, later work by other groups has shown that short inherent substrates containing tetrapeptide motifs display cross-reactivity among the caspases, making it very difficult to use these substrates in cell extracts that contain different concentrations of several caspases [4, 77] . In another approach to design caspase-specific substrates, Stennicke and colleagues synthesized a panel of IQF substrates (Fig. 4) [78] . This strategy also allowed the authors to screen caspase preferences in the prime region (i.e. to the right of the scissile bond). A kinetic analysis revealed that all tested caspases (1, 3, 6, 7 and 8) preferred small residues (Gly, Ser and Ala) at P1 0 , which could not be used as a discrimination factor to improve the specificity of the synthesized substrates. The inability to design caspase-specific substrates using a PS-SCL approach or by scanning internally quenched substrates could be due to their simple architecture. In these chemical approaches, the substrates are short linear peptides selected from a pool of peptide mixtures without an in-depth caspase subsite cooperativity analysis. Biological approaches represent an alternative strategy for substrate identification. One such strategy tested on caspases was the application of cellular libraries of peptide substrates (CLiPS), developed by Boulware et al. [79] , in which potential caspase substrates were randomly synthesized by bacteria and then subjected to fluorescence-activated cell sorting analysis. This technique not only provided a general map of caspase preferences but also revealed detailed kinetic analyses of the substrates. However, the obtained results only confirmed that the preferred caspase-3 cleavage motif is DXVD/G, and new specific sequences were not identified. The main limitation of the described methods (using both synthetic and natural substrates) is that all of these peptides contain only natural amino acids in their structure. A potential solution is an approach that has been recently proposed by Poreba et al. [80] . In the HyCoSuL approach, P4-P2 peptide mixtures are enriched with over 100 unnatural amino acids, which provides a better exploration of the chemical space in the caspase active sites and attenuates the issue of overlapping specificity. The specificity of six human apoptotic caspases (3, 6, 7, 8, 9 and 10) toward the P4-P2 positions were screened using this approach. A detailed kinetic analysis of their preferences allowed the development of peptides containing unnatural amino acids that had much greater selectivity indices than the previously reported substrates (Fig. 6) . One of these substrates was then used to specifically monitor caspase-9 activation in HEK293F cytosolic extracts, in which apoptosis is triggered by cytochrome C [80] .

Another challenging issue in distinguishing between caspases is the development of specific inhibitors.

Because potent and selective caspase inhibitors appear as potential drug candidates, this area of research has been explored more rigorously than the identification of caspase substrates. To date, thousands of caspase inhibitors have been synthesized and evaluated [9] . Some caspase inhibitors have been synthesized and used to study apoptosis in vitro and in vivo, whereas others are primarily used in pharmaceutical research [81] [82] [83] . In general, almost all strategies for developing caspase-specific inhibitors focus on (a) optimizing a peptide (or peptidomimetic) scaffold (non-prime region), (b) exploring the S1 0 pocket by introducing various warheads (cysteine traps) into PS-SCL-derived peptides (mostly DEVD, IETD and YVAD), or (c) optimizing lead non-peptide scaffold structures, such as isatins, indolones and quinolines. In previous studies, synthetic caspase inhibitors were simple peptides equipped with an electrophilic warhead (e.g. acyloxymethylketone (AOMK), fluoromethyl ketone, or an aldehyde) [9] . The peptide motifs were based on the PS-SCL results reported by Thornberry et al. [17] . However, these tools were subsequently shown to display only limited selectivity toward individual caspases [84, 85] . As the inclusion of natural amino acids in the P4-P1 positions has not produced specific inhibitors, unnatural amino acids have been introduced into peptide scaffolds. This work was pioneered by Bogyo and colleagues, who synthesized a P4-P2 combinatorial library of AOMK inhibitors containing 41 unnatural amino acids at each position [86] . The same approach was further applied by Wolan and colleagues, who synthesized a P5-P2 peptide library containing an aldehyde as an electrophilic warhead [87] [88] [89] . The introduction of unnatural amino acids into the peptide sequence resulted in increased inhibitor selectivity. Chemical approaches for developing caspase-specific inhibitors and ABPs have recently been reviewed by Poreba et al. [9] .

Mucosa-associated lymphoid tissue lymphoma translocation protein 1 (MALT1) is a paracaspase and belongs to the cysteine protease family. MALT1 contains a domain homologous to caspases, and hence it was defined by Dixit et al. as a 'paracaspase' [90] . MALT1, along with caspases, belongs to the CD clan and has the catalytic His-Cys diad characteristic of caspases. To date, only a few reports have described MALT1 substrates and probes. Probes were used on Jurkat cells, but extensive selectivity studies toward other enzymes have not been performed [91] [92] [93] . Interestingly, the probes proposed by Xin et al. bind to MALT1, but a few other signals were also observed. Therefore, there is a need to distinguish between MALT1 and other proteases that share Arg P1 specificity. It is necessary to determine the precise substrate specificity to design optimal substrates and inhibitors. The Salvesen group used PS-SCL to identify a few interesting features of MALT1. MALT1 has narrow specificity at the P4 and P1 positions, hydrolyzing Leu and Arg, respectively, but it has broad specificity at P3 and P2 [91] . This finding was helpful in the design of an ABP for this enzyme [92] . However, this ABP only contains natural amino acid residues and its specificity toward other proteases remains unclear. It might be beneficial to use unnatural amino acids to further optimize this probe to reduce potential cross-reactivity with other proteolytic enzymes. 

Similar to caspases, cathepsins are group of cysteine proteases that display overlapping substrate specificity. A broad study of the substrate preferences of these enzymes was first described by Choe et al. [7] , who used a PS-SCL approach to dissect the P4-P1 substrate specificity of six human cysteine cathepsins (L, V, K, S, F and B). The six cathepsins exhibited similar substrate preferences; the only significant difference was that cathepsin K recognized Pro at P2, whereas all other cathepsins did not. This distinguishing feature was sufficient to develop a cathepsin K-specific tetrapeptide substrate that was not hydrolyzed by other family members. However, an AOMK inhibitor with this sequence displayed off-target activity by also inhibiting cathepsin B. Another strategy for the rapid development of cathepsin substrates was substrate activity screening (SAS), proposed by Patterson et al. [94] [95] [96] . In this method, a library of aminocoumarin derivatives with various N-acyl groups was screened for activity against cathepsin S and the most active substrates were further converted into inhibitors. With the exception of PS-SCL and SAS, no other rapid methods for the discovery of selective cathepsin substrates have been reported. Most substrates developed for cathepsins are the result of extensive SAR studies, in which individual fluorogenic peptide substrates with variations in only one position were screened against a panel of cathepsins and the best candidates were selected for further optimization [97] [98] [99] .

Similar to other proteases, the development of cathepsin-specific inhibitors mostly relies on studying a peptide or peptidomimetic backbone in the unprimed site or the investigation of electrophilic warheads at P1 0 ; some of these examples have recently been reviewed [100] . One classic example for distinguishing between cathepsins is the optimization of E-64, a natural broad-spectrum inhibitor. Katunuma and colleagues have developed several highly selective cathepsin B (CA-030 and CA-074) and cathepsin L (CLIK-148 and CLIK-195) inhibitors by replacing the substituents in the epoxide electrophilic warhead and changing the non-prime recognition group [101] . Other cathepsin S-(CLIK-060) and cathepsin K-specific inhibitors (CLIK-163) were developed based on the structures of leupeptin and pyridoxal phosphate, respectively [101] . These examples clearly show that optimization of lead structures (mostly broad spectrum inhibitors) is an excellent strategy for distinguishing closely related enzymes. Another strategy for the development of potent and selective cathepsin inhibitors is the previously mentioned SAS, in which the best substrates are converted into potent inhibitors by replacing the aminocoumarin reporter group with an electrophilic pharmacophore [95] . One such seminal example is the cathepsin S probe (BMV157) developed by Oresic Bender et al. [102] . BMV157 is a near-infrared quenched-activity-based probe that becomes fluorescent only when bound to the cathepsin S active site. This molecule contains a Cy-5 fluorophore attached to a P1-Lys residue, a peptidomimetic fragment in the P3-P2 region and an acyloxymethylketone warhead equipped with QSY21, a bulky quenching group. The specificity of this probe toward cathepsin S was tested in both in vivo and in vitro models, demonstrating the wide range of its biological applications.

Legumain (asparaginyl endopeptidase) is a cysteine protease that is up-regulated in inflammatory diseases and multiple human cancers. Legumain is also very active in the tumor microenvironment; thus, it is thought to promote tumorigenesis. This protease shares some similarities with both previously described protease groups, caspases and cysteine cathepsins. An analysis of the tertiary structure of legumain assigned it to the CD clan (together with caspases, gingipains and separases). However, the localization of this enzyme is mainly lysosomal/endosomal, overlapping with cysteine cathepsins [103] . Thus, to specifically target legumain, there is a need to distinguish this protease from caspases and cysteine cathepsins. However, this goal is very challenging as legumain displays narrow specificity at the P1 position, recognizing only Asn and Asp residues. The first (quenched) activitybased probe that specifically targeted legumain in vivo (LE-28) was developed by the Bogyo group [104] . It contained a three-peptide scaffold (Glu-Pro-Asp) equipped with an acyloxymethylketone warhead and flanked by two bulky groups (Cy-5 as a fluorophore and QSY21 as a fluorescence quencher) (Fig. 4) . As Glu-Pro-Asp is the preferred motif of caspases, the specificity of LE-28 was obtained not by sequence itself, but by introducing bulky groups that drive LE-28 cellular uptake into the lysosomes. Another approach to distinguish legumain from caspases was recently proposed by Poreba et al. [12] . Using HyCo-SuL, the authors demonstrated that the legumain specificity at the P4 position was very broad, tolerating even D-amino acids (Fig. 6 ). Thus, it was possible to use counter selection to extract a P4-P1 tetrapeptide motif (with Asp at P1) that was not recognized by apoptotic caspases. This motif was then used to develop a legumain-specific substrate, inhibitor, and activity-based probe for tracking legumain activity in various biological models. This study provided proof of principle for the use of the CoSeSuL approach to develop specific tools for proteases that display broad substrate specificity.

The human genome encodes approximately 100 DUBs, representing more than 15% of all human proteases. These enzymes also display some similarities in substrate preferences. High-throughput screening has been one of the most successful strategies applied to differentiate these enzymes. The screening of over 40 000 compounds against the closely related ubiquitin Cterminal hydrolase L1 (UCH-L1) and L3 (UCH-L3) enzymes allowed the development of a UCHL1-specific inhibitor based on an isatin O-acyl oxime scaffold [105] . In another approach, Borodovsky and colleagues used a chemical ligation method (intein strategy), in which the C terminus of the ubiquitin protein was modified with different warheads [106, 107] . Depending on the type of warhead, these probes exhibited some level of specificity toward different DUBs. In another approach, Drag et al. used PS-SCL to identify the precise substrate specificity of three human DUBs, ovarian tumor 1, isopeptidase T and UCH-L3, and one viral ubiquitin-specific protease, the papain-like protease from the severe acute respiratory syndrome coronavirus [108] . A kinetic analysis of the P4-P2 positions revealed differences between these enzymes that can be further exploited in the design of specific substrate candidates. The PS-SCL library approach was also used to analyze the substrate specificity of human desumoylating enzymes (sentrin-specific proteases; SENPs). The results from the screen of this library clearly showed similar substrate specificities for natural amino acids in the P4, P3 and P2 positions between SENPs. The design of specific probes for DUBs and SENPs remains a substantial challenge.

Matrix metalloproteinases (MMPs) include the 23 members of the zinc-dependent endopeptidase family in the metzincin class of metalloendopeptidases that share a common domain structure [109] . The most common classification of MMPs is based on the historical assessment of their catalytic preferences and their cellular localization. Therefore, one can distinguish four groups of MMPs: collagenases, gelatinases, stromelysins and membrane-type MMPs. Nevertheless, there are several MMPs that are not classified into these traditional groups [110, 111] .

MMPs were initially associated with cancer progression due to their ability to degrade connective tissues between the linings of blood vessels and between cells. These enzymes are involved in a wide variety of biological processes, such as tissue remodeling associated with angiogenesis, morphogenesis and tissue repair [112] . Due to their important roles in humans, MMPs are subjected to very precise spatial and temporal control to prevent them from becoming destructive [109] . Over the last 15 years, many studies have focused on identifying selective molecules to discriminate between MMPs. The application of the high-throughput proteomic identification of protease cleavage site method, which investigates primary and extended substrate specificity, profiled MMPs 1, 2, 3, 7, 8, 9, 12, 13 and 14 using 4300 biologically diverse peptides. This study allowed the general specificity features shared by most MMPs to be determined and explored features characterizing particular MMPs [110, 111] . However, the greatest challenge in distinguishing between MMPs in the above-mentioned groups, as well as between these groups, is the ability of MMPs to hydrolyze similar peptide substrates. In 2003 MMP-2 and MMP-9 were shown by Chen et al. to exhibit overlapping substrate recognition, but they could be distinguished by their S2 pocket preferences [113] . Precise kinetic studies of the synthetic peptides showed that replacement of one amino acid residue in the P2 position can convert a relatively selective MMP-2 substrate into an MMP-9 substrate. This finding underlined the significance of the P2 position and suggested that the S2 subsites of MMP-2 and MMP-9 interact with substrates differently [113] .

In 2014, Ratnikov et al. compared the MMP cleavage yield of a selected and extended set of phage peptide substrates from approximately 64 million sequences and quantified the structure-function dependence of eight different MMPs representing three phylogenetic subfamilies [114] . The chosen proteases included MMP-2 and -9 from the gelatinase subfamily, transmembrane MMPs 14, 15, 16 and 24; and MMP-17 and -25 from the GPI-MMP subfamily that contain a glycophosphatidyl inositol (GPI) cell surface anchor. The calculated second-order rate constants (k obs ) constitute a good distance measure of the functional similarities between the matrix metalloproteinase branches [114] .

Based on the pathological role of MMPs in disease, attempts have been made to identify drugs that harness the therapeutic potential of MMP inhibition.

More than 50 inhibitors targeting MMP activity have been developed, but all have failed during clinical trials due to a lack of specificity toward the targeted enzymes [1] .

Preliminary trials to develop MMP inhibitors have focused on compounds that chelate the catalytic Zn 2+ ion and possess backbones mimicking the natural peptide substrate for the individual MMPs. The first-generation MMP inhibitors were composed of a collagen backbone and a hydroxamic acid (-C(O)NHOH) [115] . Based on this scaffold, several collagen-based peptidomimetic hydroxamates were developed, including ilomastat, marimastat and batimastat. Batimastat, the first MMP inhibitor to enter clinical trials [116] , showed strong inhibition toward MMP-1 (IC 50 = 3 nM), -2 (IC 50 = 4 nM), -7 (IC 50 = 6 nM) and -9 (IC 50 = 1 nM); however, the low specificity observed in the clinical trials was disappointing. A new generation of hydroxamate-based MMP inhibitors consisting of a substituted aryl, a sulfonamide and a zinc-binding group was proposed. MMI-270 and MMI-166 were identified as selective inhibitors of MMP-2, -9 and -14 [117] .

Reverse hydroxamates and non-hydroxamate inhibitors were developed to avoid the limitations associated with hydroxamate-based inhibitors, such as metabolic inactivation and cross-reactivity with other metalloproteases. As the structures of MMPs were revealed by crystallography, the next generation of MMP inhibitors was no longer restrained to substrate-like compounds and new molecules with diverse peptidomimetic and non-peptidomimetic scaffolds were designed [118] . The investigation of numerous different structures led to the identification of selective MMP inhibitors. For example, the thiol-based inhibitor SB-3CT, which is composed of a deep-pocket-binding diphenylether scaffold, was specific for MMP-2 and -9 with inhibition constants (K i ) of 14 and 600 nM, respectively [119] .

Due to the high structural homology between different members of the MMP family, it is difficult to distinguish the enzymes. Recent studies have turned to targeting less conserved sites of MMPs instead of the previously targeted catalytic sites. MMPs, like all proteolytic enzymes, possess prime and non-prime subsites. However, the S1 0 pocket was the most important site for substrate recognition [109] , and thus exhibits the greatest degree of variability among MMPs in terms of the amino acid sequence and pocket shape. Based on this knowledge, a number of S1 0 pocketspecific inhibitors were designed that showed improved selectivity for MMP-2 over MMP-9 [120] .

Another trend in distinguishing between members of the MMP family is the application of antibody-based therapeutics that utilize highly selective and potent function-blocking antibodies. An extremely selective MMP-14 inhibitor, DX-2400, was identified using a human antibody phage display library and automated selection strategies [121] . Further examples can be found in the literature [122, 123] .

Almost all previous studies that focused on distinguishing MMPs constitute a useful starting point for future research. In the 1998 study by Mucha et al., the use of unusual amino acids improved MMP substrate specificity [124] and provided further perspectives for this type of study.

Aminopeptidases are exopeptidases that hydrolyze one or two amino acid residues from the N-terminal fragment of the substrate [125, 126] . Enzymes belonging to the aminopeptidase M1 family share several common features that are important for catalytic activation, such as the highly conserved HEXXH(X)18E zincbinding and GAMEN motifs [127] . Until recently, aminopeptidase studies using selective substrates were generally neglected, mainly due to the lack of specific chemical tools, and only a few generic substrates were used in aminopeptidase studies. Classic examples include alanine and leucine substrates used for the M1 and M17 family, respectively [3] . The aminopeptidase fingerprint approach published by Drag et al. overcame this limitation [128] and led to the first scrupulous analysis of aminopeptidase N specificity in Fig. 7 . Examples of specific peptide sequences used in substrates for aminopeptidases. LtA4h, leukotriene A4 hydrolase. different species. A defined fluorogenic library with the sequence NH 2 -X-ACC, where X represents amino acid residues enriched with 42 unnatural amino acid derivatives, was applied. Subsequently, the library was later expanded with the addition of several new derivatives and has been applied to several aminopeptidases from humans and other organisms (e.g. bacteria, animals and plants) [129] [130] [131] . These studies have led to the [147] general conclusion that unnatural amino acids are more preferred by aminopeptidases than natural amino acids (Fig. 7) . Moreover, this approach was applied to distinguish aminopeptidases in cell lysates and provide clear information about the presence of individual aminopeptidases, which was not previously possible [5] . To date, this aminopeptidase fingerprint approach using unnatural amino acids is the best-known method to distinguish between aminopeptidases.

Carboxypeptidases are enzymes responsible for the hydrolysis of C-terminal amino acids of their peptide substrates, and this attribute underlies their participation in various essential biological processes. To date, there are 30 known representatives of human carboxypeptidases or peptidases with established carboxypeptidase activity (reported in the MEROPS database) [132, 133] . They are classified by the chemical nature of their active site. Therefore, one can distinguish metallo-carboxypeptidases (families M2, M14, M20 and M28), serine carboxypeptidases (families S10 and S28) and cysteine carboxypeptidases (family C1). The substrate specificity of numerous carboxypeptidases has been determined using various techniques, primarily kinetic analyses of synthetic substrates and biologically relevant peptides [134] , as well as proteome-derived peptide libraries [135] . In 2002, Barinka et al. used a novel assay to determine the hydrolytic activity of recombinant human glutamate carboxypeptidase II, which was based on the fluorimetric detection of released a-amino groups, and adopted obtained results for enzyme characterization [136] . Dipeptide libraries were synthesized via solid-phase peptide synthesis, and the action of pure recombinant enzymes on a panel of dipeptides was tested. The release of a C-terminal amino acid residue was determined fluorometrically utilizing OPA derivatization [136] . The substrate specificity of carboxypeptidase M was determined in similar way by Deiteren et al. in 2007, the difference being the utilization of a series of benzoyl-Xaa-Arg substrates [137] . Not long after, Fricker and colleagues defined the enzymatic preferences of carboxypeptidase A4 and A6 for the first time by using the overlapping strength of three interdependent approachesa kinetic analysis of chromogenic substrates, the individual cleavage of synthetic peptides and a quantitative peptidomics mass spectrometrybased approach [138, 139] . To date, there have been many further studies of carboxypeptidases and their substrate specificities, which can be found in the literature [140] .

One enzyme, few catalytic activities

The ubiquitin-proteasome system plays a pivotal role in the controlled degradation of proteins in eukaryotic cells. Its proteolytic action on regulatory proteins makes it one of the most important enzyme systems, and it is involved in almost all biological processes, including cell development and differentiation. Proteolysis of the substrates occurs within the 20S proteasome core particle, which is a barrel-shaped protein complex containing four stacked rings that are composed of seven a subunits and seven b subunits, all of which are unique. Only three of these subunits possess the catalytic threonine residue required for proteolytic cleavage: the b1, b2 and b5 subunits [141] .

To date, studies conducted by Harris et al. have been the most important because they provided primary knowledge of the substrate specificity of individual proteasome subunits. The primary N-terminal and extended substrate specificity of the human 20S proteasome were defined in the presence of 11S proteasome activators using a comprehensive, composite set of fluorogenic substrates. The data promoted the full characterization of proteasome activity and the dissection of the proteolytic effects of the individual proteasome subunits, information that is essential for the design of more specific substrates and inhibitors [141] . This approach constituted a powerful tool to determine the non-prime residues of a peptide substrate. Other methods were employed to examine prime-side substrate specificity. For example, Lesner et al. employed combinatorial chemistry with internally quenched fluorescent peptide substrates to synthesize peptide substrates optimized at both the prime and non-prime regions of the peptide chain. This approach was used to determine the trypsin-like subunit substrate specificity and select the best substrates (ABZ-Val-Val-Ser-Arg-Ser-Leu-Gly-Tyr(3NO 2 )-NH 2 , k cat /K m = 934 000 M À1 Ás À1 ; and ABZ-Val-Val-Ser-GNF-Ala-Met-Gly-Tyr(3NO 2 )-NH 2 , k cat /K m = 1 980 000 M À1 Ás À1 ), which showed significant selectivity over different proteasome units [142] . Later, the Lesner group investigated the chymotrypsinlike activity of the 20S proteasome subunit (b5) and identified the selective substrate ABZ-Val-Val-Ser-Tyr-Ala-Met-Gly-Tyr(3NO 2 )-NH 2 and determined its kinetic parameters (k cat /K m = 9.7 9 10 5 M À1 Ás À1 , k cat = 8 s À1 ). These were the first studies on the chymotrypsin-like activity alone, leading to the optimization of the prime area of the substrate and avoiding non-selective processing by other proteases with overlapping specificity [143] .

Many natural or synthetic compounds can inhibit protein degradation by the proteasome. Natural inhibitors include lactacystin, epoxyketones and the TMC-95 cyclic peptides. The first synthetic compounds reported to inhibit proteasome activity were peptide aldehydes such as Z-Leu-Leu-Leu-al (MG132), Z-Ile-Glu(OBut)-Ala-Leu-al (PSI), or peptide vinyl sulfones. These compounds bind to the active sites within the 20S core particle. However, these reactive compounds have many drawbacks, including a lack of specificity toward individual proteasome active subunits. Additionally, most of these compounds are also known to inhibit serine and cysteine proteases (e.g. calpains and cathepsins); therefore, it would be inadvisable to use these compounds as drugs [144] .

To date, the three most important proteasome inhibitors that have been approved as drugs are bortezomib [145] , carfilzomib [146] and ixazomib [147] . All three preferentially bind to the b5 subunit, which is responsible for the chymotrypsin-like activity of the proteasome, but they also inhibit the caspase-and trypsin-like proteolytic sites at higher doses. G. de Bruin and colleagues reported the most significant attempt to distinguish particular proteasome subunits. Based on a previous study, this group presented a set of ABPs that promoted the gel-based detection of all catalytic human proteasome subunits and also developed specific inhibitors for individual proteasome activities [148] .

Recently, the Bogyo group designed inhibitors based on amino acid preferences specific to Plasmodium falciparum and not the human proteasome [149] . This elegant work showed that proteasome orthologs can be distinguished using natural amino acids; in this case, the bulky tryptophan was a key amino acid. The human proteasome showed chymotrypsin-like (bulky amino acids), trypsin-like (basic amino acids) and caspase-like (acidic, Asp) activity, whereas the P. falciparum proteasome preferred aromatic Trp at the S2 and S3 pockets [149] .

Our knowledge about proteases has increased dramatically over the years (Table 1) . However, studies were typically restricted to one particular group of enzymes, not individual enzymes, due to the lack of specific substrates, inhibitors and ABPs. For several years, many studies have attempted to identify specific molecules that act on different enzymes with similar substrate preferences (Fig. 2) . A number of modern design techniques have been applied to identify protease-specific sequences, including phage display, PS-SCL, CLiPS and proteomics. However, these methods are limited to natural amino acids, which are not always sufficient to distinguish between closely related enzymes such as caspases, neutrophil serine proteases, aminopeptidases or cathepsins. The recently described HyCoSuL and CoSeSuL techniques incorporate a broad range of unnatural structures and are currently the methods of choice to distinguish between enzymes with similar substrate specificities that cannot be successfully targeted with chemical tools possessing only natural amino acids.

Emerging principles in protease-based drug discovery

Targeting proteases: successes, failures and future prospects

MEROPS: the peptidase database

Overlapping cleavage motif selectivity of caspases: implications for analysis of apoptotic pathways

Activity profiling of aminopeptidases in cell lysates using a fluorogenic substrate library

Substrate profiling of cysteine proteases using a combinatorial peptide library identifies functionally unique specificities

Specificity profiling of seven human tissue kallikreins reveals individual subsite preferences

Current strategies for probing substrate specificity of proteases

Small molecule active site directed tools for studying human caspases

New approaches for dissecting protease functions to improve probe development and drug discovery

Recent advances and concepts in substrate specificity determination of proteases using tailored libraries of fluorogenic substrates with unnatural amino acids

Counter selection substrate library strategy for developing specific protease substrates and probes

Current trends and challenges in proteomic identification of protease substrates

Activity-based profiling of proteases

Holistic view on the extended substrate specificities of orthologous granzymes

A combinatorial approach for determining protease specificities: application to interleukin-1beta converting enzyme (ICE)

A combinatorial approach defines specificities of members of the caspase family and granzyme B. Functional relationships established for key mediators of apoptosis

Rapid and general profiling of protease specificity by using combinatorial fluorogenic substrate libraries

Design of ultrasensitive probes for human neutrophil elastase through hybrid combinatorial substrate library profiling

Intramolecularly quenched fluorogenic substrates for hydrolytic enzymes

Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface

Substrate phage: selection of protease substrates by monovalent phage display

Phage display substrate: a blind method for determining protease specificity

Global substrate profiling of proteases in human neutrophil extracellular traps reveals consensus motif predominantly contributed by elastase

Neutral serine proteases of neutrophils

NSP4 is stored in azurophil granules and released by activated neutrophils as active endoprotease with restricted specificity

Neutrophil serine proteases in antibacterial defense

Neutrophil extracellular traps kill bacteria

Zymogen activation specificity and genomic structures of human neutrophil elastase and cathepsin G reveal a new branch of the chymotrypsinogen superfamily of serine proteinases

Three human elastase-like genes coordinately expressed in the myelomonocyte lineage are organized as a single genetic locus on 19pter

Localization of the gene encoding proteinase-3 (the Wegener's granulomatosis autoantigen) to human chromosome band 19p13.3

NSP4, an elastase-related protease in human neutrophils with arginine specificity

The human mast cell chymase gene (CMA1): mapping to the cathepsin G/granzyme gene cluster and lineage-restricted expression

Sequence and molecular characterization of human monocyte/neutrophil elastase inhibitor

Human plasma proteinase inhibitors

Specificity of human cathepsin G

Global identification of peptidase specificity by multiplex substrate profiling

A new proteinase 3 substrate with improved selectivity over human neutrophil elastase

The elastase-PK101 structure: mechanism of an ultrasensitive activity-based probe revealed

Design of a selective substrate and activity based probe for human neutrophil serine protease 4

Protease inhibitors in the clinic

Design and synthesis of new orally active nonpeptidic inhibitors of human neutrophil elastase

Irreversible inhibition of serine proteases by peptidyl derivatives of alpha-aminoalkylphosphonate diphenyl esters

Simple phosphonic inhibitors of human neutrophil elastase

Human neutrophil elastase phosphonic inhibitors with improved potency of action

New selective peptidyl di(chlorophenyl) phosphonate esters for visualizing and blocking neutrophil proteinase 3 in human diseases

A family of serine esterases in lytic granules of cytolytic T lymphocytes

Granules of cytolytic T-lymphocytes contain two serine esterases

Generation of catalytically active granzyme K from Escherichia coli inclusion bodies and identification of efficient granzyme K inhibitors in human plasma

Human and murine cytotoxic T lymphocyte serine proteases: subsite mapping with peptide thioester substrates and inhibition of enzyme activity and cytolysis by isocoumarins

Met-ase: cloning and distinct chromosomal location of a serine protease preferentially expressed in human natural killer cells

The substrate specificity profile of human granzyme A

Granzyme A activates another way to die

Human and mouse granzyme A induce a proinflammatory cytokine response

Ignition of p53 bomb sensitizes tumor cells to granzyme K-mediated cytolysis

Granzyme K displays highly restricted substrate specificity that only partially overlaps with granzyme A

A quarter century of granzymes

Definition and redesign of the extended substrate specificity of granzyme B

Granzyme M is a regulatory protease that inactivates proteinase inhibitor 9, an endogenous inhibitor of granzyme B

The human cytotoxic T cell granule serine protease granzyme H has chymotrypsin-like (chymase) activity and is taken up into cytoplasmic vesicles reminiscent of granzyme B-containing endosomes

A novel substrate-binding pocket interaction restricts the specificity of the human NK cell-specific serine protease, Met-ase-1

Subsite specificities of granzyme M: a study of inhibitors and newly synthesized thiobenzyl ester substrates

Human and mouse granzyme M display divergent and speciesspecific substrate specificities

Selective chemical functional probes of granzymes A and B reveal granzyme B is a major effector of natural killer cellmediated lysis of target cells

Unleashing the therapeutic potential of human kallikrein-related serine proteases

Identification of novel prostate-specific antigen-binding peptides modulating its enzyme activity

Immunopeptidometric assay for enzymatically active prostate-specific antigen

Novel peptide inhibitors of human kallikrein 2

Exploring the active site binding specificity of kallikrein-related peptidase 5 (KLK5) guides the design of new peptide substrates and inhibitors

Development of recombinant inhibitors specific to human kallikrein 2 using phagedisplay selected substrates

Mutant recombinant serpins as highly specific inhibitors of human kallikrein 14

Enzymatic profiling of human kallikrein 14 using phage-display substrate technology

Caspases: keys in the ignition of cell death

Cacidases: caspases can cleave after aspartate, glutamate and phosphoserine residues

Human caspases: activation, specificity, and regulation

Caspase substrates and inhibitors

Evaluation of recombinant caspase specificity by competitive substrates

Internally quenched fluorescent peptide substrates disclose the subsite preferences of human caspases 1, 3, 6, 7 and 8

Protease specificity determination by using cellular libraries of peptide substrates (CLiPS)

Unnatural amino acids increase sensitivity and provide for the design of highly selective caspase substrates

Pharmacological caspase inhibitors: research towards therapeutic perspectives

Caspase inhibitors: viral, cellular and chemical

Caspase inhibitors: a pharmaceutical industry perspective

Some commonly used caspase substrates and inhibitors lack the specificity required to monitor individual caspase activity

Commonly used caspase inhibitors designed based on substrate specificity profiles lack selectivity

Identification of early intermediates of caspase activation using selective inhibitors and activity-based probes

Selective inhibition of initiator versus executioner caspases using small peptides containing unnatural amino acids

Selective detection and inhibition of active caspase-3 in cells with optimized peptides

Selective detection of caspase-3 versus caspase-7 using activity-based probes with key unnatural amino acids

Identification of paracaspases and metacaspases: two ancient families of caspase-like proteins, one of which plays a key role in MALT lymphoma

Mechanism and specificity of the human paracaspase MALT1

Probes to monitor activity of the paracaspase MALT1

Development of new Malt1 inhibitors and probes

Substrate activity screening (SAS): a general procedure for the preparation and screening of a fragment-based non-peptidic protease substrate library for inhibitor discovery

Identification of selective, nonpeptidic nitrile inhibitors of cathepsin s using the substrate activity screening method

Substrate activity screening: a fragment-based method for the rapid identification of nonpeptidic protease inhibitors

Photometric or fluorometric assay of cathepsin B, L and H and papain using substrates with an aminotrifluoromethylcoumarin leaving group

Comparative substrate specificity analysis of recombinant human cathepsin V and cathepsin L

S3 to S3' subsite specificity of recombinant human cathepsin K and development of selective internally quenched fluorescent substrates

Cysteine proteases as therapeutic targets: does selectivity matter? A systematic review of calpain and cathepsin inhibitors

Structure-based development of specific inhibitors for individual cathepsins and their medical applications

Design of a highly selective quenched activity-based probe and its application in dual color imaging studies of cathepsin S activity localization

Structure and function of legumain in health and disease

Functional imaging of legumain in cancer using a new quenched activity-based probe

Discovery of inhibitors that elucidate the role of UCH-L1 activity in the H1299 lung cancer cell line

Chemistry-based functional proteomics reveals novel members of the deubiquitinating enzyme family

Specific and covalent targeting of conjugating and deconjugating enzymes of ubiquitin-like proteins

Positional-scanning fluorigenic substrate libraries reveal unexpected specificity determinants of DUBs (deubiquitinating enzymes)

Is there new hope for therapeutic matrix metalloproteinase inhibition?

Active site specificity profiling datasets of matrix metalloproteinases

Active site specificity profiling of the matrix metalloproteinase family: proteomic identification of 4300 cleavage sites by nine MMPs explored with structural and synthetic peptide cleavage analyses

Strategies for MMP inhibition in cancer: innovations for the posttrial era

A residue in the S2 subsite controls substrate selectivity of matrix metalloproteinase-2 and matrix metalloproteinase-9

Basis for substrate recognition and distinction by matrix metalloproteinases

Design and therapeutic application of matrix metalloproteinase inhibitors

Phase I study of intrapleural batimastat (BB-94), a matrix metalloproteinase inhibitor, in the treatment of malignant pleural effusions

Anti-invasive effect of MMI-166, a new selective matrix metalloproteinase inhibitor, in cervical carcinoma cell lines

To bind zinc or not to bind zinc: an examination of innovative approaches to improved metalloproteinase inhibition

Potent mechanism-based inhibitors for matrix metalloproteinases

An integrated computational and experimental approach to gaining selectivity for MMP-2 within the gelatinase subfamily

Generation of high-affinity human antibodies by combining donorderived and synthetic complementarity-determiningregion diversity

SZ-117, a monoclonal antibody against matrix metalloproteinase-2 inhibits tumor cellmediated angiogenesis

Inhibitors of gelatinase B/matrix metalloproteinase-9 activity comparison of a peptidomimetic and polyhistidine with single-chain derivatives of a neutralizing monoclonal antibody

Membrane type-1 matrix metalloprotease and stromelysin-3 cleave more efficiently synthetic substrates containing unusual amino acids in their P1' positions

Aminopeptidases: structure and function

Aminopeptidases: towards a mechanism of action

The characteristics, functions and inhibitors of three aminopeptidases belonging to the m1 family

Aminopeptidase fingerprints, an integrated approach for identification of good substrates and optimal inhibitors

S1 pocket fingerprints of human and bacterial methionine aminopeptidases determined using fluorogenic libraries of substrates and phosphorus based inhibitors

Fingerprinting the substrate specificity of M1 and M17 aminopeptidases of human malaria, Plasmodium falciparum

Aminopeptidase N1 (EtAPN1), an M1 metalloprotease of the apicomplexan parasite Eimeria tenella, participates in parasite development

Carboxypeptidases from A to z: implications in embryonic development and Wnt binding

Carboxyterminal protein processing in health and disease: key actors and emerging technologies

A novel rat carboxypeptidase, CPA2: characterization, molecular cloning, and evolutionary implications on substrate specificity in the carboxypeptidase gene family

Proteome-derived peptide libraries to study the substrate specificity profiles of carboxypeptidases

Substrate specificity, inhibition and enzymological analysis of recombinant human glutamate carboxypeptidase II

The role of the S1 binding site of carboxypeptidase M in substrate specificity and turnover

Substrate specificity of human carboxypeptidase A6

Characterization of the substrate specificity of human carboxypeptidase A4 and implications for a role in extracellular peptide processing

Comparative substrate specificity study of carboxypeptidase U (TAFIa) and carboxypeptidase N: development of highly selective CPU substrates as useful tools for assay development

Substrate specificity of the human proteasome

Novel internally quenched substrate of the trypsin-like subunit of 20S eukaryotic proteasome

Bladder cancer detection using a peptide substrate of the 20S proteasome

Proteasome inhibitors: from research tools to drug candidates

Bortezomib (PS-341): a novel, first-in-class proteasome inhibitor for the treatment of multiple myeloma and other cancers

Carfilzomib: a novel second-generation proteasome inhibitor

Evaluation of the proteasome inhibitor MLN9708 in preclinical models of human cancer

A set of activity-based probes to visualize human (immuno) proteasome activities

Structure-and function-based design of Plasmodium-selective proteasome inhibitors

Discriminating between the activities of human neutrophil elastase and proteinase 3 using serpin-derived fluorogenic substrates

Design and use of highly specific substrates of neutrophil elastase and proteinase 3

Separation of enzymatically active and inactive prostate-specific antigen (PSA) by peptide affinity chromatography

Irreversible inhibitors of serine, cysteine, and threonine proteases

Fast profiling of protease specificity reveals similar substrate specificities for cathepsins

Proteomic identification of protease cleavage sites characterizes prime and non-prime specificity of cysteine cathepsins B, L, and S

PK, MP, KG and MD wrote and reviewed the manuscript.