key: cord-0996450-voem879q authors: Shao, Yi‐Ming; Yang, Wen‐Bin; Peng, Hung‐Pin; Hsu, Min‐Feng; Tsai, Keng‐Chang; Kuo, Tun‐Hsun; Wang, Andrew H.‐J.; Liang, Po‐Huang; Lin, Chun‐Hung; Yang, An‐Suei; Wong, Chi‐Huey title: Structure‐Based Design and Synthesis of Highly Potent SARS‐CoV 3CL Protease Inhibitors date: 2007-08-23 journal: Chembiochem DOI: 10.1002/cbic.200700254 sha: 0db9994c58543d320be02e17f2298586e3e566d0 doc_id: 996450 cord_uid: voem879q In a successful example of lead optimization by computer modeling prediction, computational technology was used to optimize a lead inhibitor (TL‐3) of the SARS‐CoV 3CL protease. A novel C (2)‐symmetric diol (1) was then designed and synthesized, and displayed higher affinity than the original lead compound by one order of magnitude in its inhibition constant (0.6→0.073 μm). We believe that this approach has provided a platform for further lead optimization.[Image: see text] Severe acute respiratory syndrome (SARS), a life-threatening respiratory disease, was first reported in Southern China in November 2002, and it spread widely to other Asian countries, North America, and Europe. According to the World Health Organization (WHO), a total of 8098 people worldwide became sick with SARS during the 2003 outbreak, and 774 of the infected patients died. A novel coronavirus associated with cases of SARS (SARS-CoV) was identified as the etiological agent of this endemic atypical pneumonia. [1] SARS-CoV is a single-stranded positive-strand RNA virus, and its genome structure comprises both replicase and structural regions. [2] The polyproteins, pp1a (486 kDa) and pp1ab (790 kDa), encoded by the viral replicase gene, are subject to extensive proteolytic processing by viral proteases to produce multiple functional subunits, which are responsible for the formation of the replicase complex. The SARS-CoV 3CL protease, named after the 3C protease of the Picornaviridae, is a~33 kDa cysteine protease that cleaves the replicase polyprotein at 11 conserved sites with canonical Leu-Glnfl(Ser, Ala, Gly) sequences [3] [4] [5] Because of the functional importance of SARS-CoV 3CL protease in the viral life cycle, together with successes in developing efficacious antiviral agents targeting 3C-like proteases in other viruses, [6] this enzyme has been recognized as a prime target for therapeutic intervention against SARS-CoV infection. In its X-ray crystal structures the 3CL protease forms a dimer with two protomers, each of which composed of three domains. The active site contains a catalytic dyad (Cys145 and His41), and the substrate-binding subsite S1 of the enzyme has absolute specificity for Gln-P1 of the substrate. [7] To date, a large number of inhibitors of 3CL protease have been studied, including molecules identified from high-throughput screening, [8] electrophilic analogues, [9] isatin derivatives, [10] peptidomimetic a,b-unsaturated esters, [11] peptidic anilides, [12] and benzotriazole esters. [13] However, these molecules lack further structural studies to provide in-depth understanding of molecular interactions of the enzyme-inhibitor complex, and/or for structure-based optimization. There are four reports that describe the 3CL protease structures in complexation with inhibitors. [7, [14] [15] [16] In all these complex structures, the ligands are irreversible inhibitors; that is, they are covalently bound to the target protein. Although the Sg atom of Cys145 at the enzyme active site displays bond formation variously with the methylene group of the chloromethyl ketone (CMK), [7] the C b atoms of Michael acceptors, [14, 15] or the C3 atom of an aza-peptide epoxide (APE), [16] these mechanism-based inhibitors do not demonstrate satisfactory potency (IC 50 = 2 mm for CMK; IC 50 = 70 mm, K i = 10.7 mm for Michael acceptors; K i = 18 mm for APE) against the 3CL protease. In general, reversible inhibitors produce fewer side effects than suicide inhibitors and are thus more suitable for therapeutic development. [17] TL-3, a noncovalent HIV protease inhibitor (K i = 1.5 nm) previously developed in our laboratory, [18] was found to be an inhibitor of the 3CL protease with a K i value of 0.6 mm. [19] Previous studies have shown that TL-3 is A C H T U N G T R E N N U N G effective against FIV protease and many drug-resistant HIV proteases, has a strong ability to control lentiviral infections in tissue culture, and exerts no adverse effects in ICR mice up to the dose level of 2000 mg kg À1 by gavage during the 14 day study period. The compound is also negative in mouse peripheral blood micronucleus assay. With these considerations taken into account, TL-3 was selected as a lead compound for further optimization in the search for higher inhibition potency. We initially incorporated a series of l-amino acids in place of the Val-Ala residues of TL-3. However, none of these enhanced the inhibitory activity (see the Supporting Information). The negative results led us to suspect that the main binding mode of TL-3 was energetically dominated by the two phenyl groups. Optimization of TL-3 as an inhibitor against the 3CL protease by replacement of the peripheral Val-Ala residues or the two central phenyl groups was based on the rationale that the binding mode of TL-3 in the protein-ligand complex mainly involves at least a dipeptide scaffold. We thus carried out computational modeling methods to explore all 20 20 dipeptides as model ligands for possible protein-ligand interactions. An exhaustive exploration of the dipeptide binding modes using automated computational modeling procedures revealed that the Trp-Trp dipeptide emerged as the strongest binder on the basis of the ranking system used. Moreover, 63 out the topranked 100 protein-ligand complexes contained at least one Trp residue (for details see the Supporting Information). We thus synthesized two compounds to test the binding mode hypothesis: one with two Trp groups adjacent to the central diol (4, Scheme 1) and the other with two additional Val-Ala residues as in 9. The synthesis started with l-Trp as shown in Scheme 1. Cbzl-tryptophan (2) was converted into its Weinreb amide (3), and this was followed by reduction to give the protected l-tryptophanal. The aldehyde was then subjected to stereoselective pi-nacol homocoupling according to the reported method [20] to yield diol 4. Successive protections of the diol with isopropylidene and of the indole nitrogens with tert-butoxycarbonyl (Boc) provided 6 in 53 % overall yield. Removal of the terminal Cbz groups by hydrogenolysis and subsequent coupling with Cbz-Val-OH produced adduct 7 in 78 % yield. Compound 8, containing additional Ala residues, was prepared by a similar process in 98 % yield. Final deprotection under acidic conditions afforded the desired diol 9 in 43 % yield. Compounds 4 and 9 were tested against the 3CL protease and proved to be more effective competitive inhibitors of the 3CL protease than TL-3, with K i values of 0.34 and 0.073 mm, respectively. It is noted that compound 9 is highly selective for the 3CL protease and that no inhibition was observed against HIV protease at 100 mm. In order to understand the molecular interactions of the enzyme-inhibitor complex in detail, we carried out an X-ray crystallography analysis of the SARS-CoV 3CL protease complex containing compound 4. The preliminary results revealed that a conformational change, presumably induced by the presence of the ligand, was observable at the loop region near the active site (Figure 1 A) . Differential electron density maps in the presence of compound 4 indicated that the ligand was partially visible, but the molecular structure of the ligand could not be precisely defined. However, computational structures of compound 4 with the 3CL protease were constructed by use of the preliminary X-ray protein structure obtained in the presence of compound 4. This structure was different from the 3CL protease structure used in the previous modeling task in that the previous structure was derived with chloromethyl ketone complexed with the protease. We expected that this round of computational modeling would produce a more realistic protein-ligand complex model. The predicted model ligand structure would then be superimposed onto the differential electron density maps to confirm the validity of the binding mode of the ligand on the target protein. As de- www.chembiochem.org scribed in the Supporting Information, the prediction of the model binding structure of compound 4 did not use any information from the differential electron density map; the purpose was to use the differential electron density map as an independent test to assess the validity of the predicted binding mode. Visual examination of the top-ranked protein-ligand complex structures generated from all three docking software packages revealed a consensus complex structure, suggesting a tight packing of the two tryptophan sidechains into the S1 and S2 binding pockets, with the two Cbz moieties occupying the neighboring S3 and S1' sites. The consensus was particularly strong with regard to the tryptophan sidechain conformation, for which precise hydrogen-binding configurations satisfy-ing hydrogen-bonding donors in the tryptophan side chains were conserved throughout the top-ranked structures, while the hydrophobic packing between the tryptophan side chains and the ligand-binding sites was tightly matched. The induced hydrogen-bonded local conformation involving side chain Asn142 fitted perfectly to accommodate the tryptophan sidechain as part of the S1 binding site in the model structure, while His41 formed another hydrogen-bonding pattern with another indole group at the S2 site (Figure 1 B) . The binding conformations of the Cbz moieties, however, were less certain from the consensus complex structures. The uncertainty in modeling Cbz moieties in the ligand-protein complex was the result of a lack of favorable interactions between the Cbz moieties and substrate binding site in the protein. This uncertainty was also manifested in the incomplete X-ray crystallographic data for the Cbz groups. The paucity of favorable Cbz-protein interactions in the complex model suggested the possibility for further improvement in the binding affinity and specificity through replacement of the Cbz moieties with other functional groups. The consensus complex structure was compared with the differential density map, which shows the superimposition of the differential electron density map and the modeled binding mode of compound 4. The comparison indicated that all the visible fragments in the differential electron density maps were accounted for by the model ligand conformation in the active site, especially the positions of the indole groups. The consistency of the consensus model structures and the differential electron density map supported the validity of the model complex structure. The ligand-protein complex shown in Figure 1 B was different from the top-ranked Trp-Trp binding mode predicted from the preliminary modeling task (for differences see Figure S2 ). In retrospect, although the discrepancies could arise from the differences in the target protein structures and the ligand molecule structures used in the modeling tasks, the partial agreement between the two models ( Figure S2 ) nevertheless suggested that the preliminary modeling effort provided a productive direction for improvement of the ligand-protein binding affinity. The subsequent refinement of the complex structure significantly improved the accuracy of the binding model, and provided a working model for further optimization of the lead compounds. In summary, we have used computational technology to optimize a lead inhibitor (TL-3) of the 3CL protease. The C 2 -symmetric diol 9 was then designed and synthesized, and showed higher affinity than the previous lead compound (TL-3), by one order of magnitude in its inhibition constant (0.6!0.073 mm). Compounds 4 and 9 represent the most potent noncovalent 3CL protease inhibitors with structural analyses reported to date. This was a successful example of lead optimization by computer modeling prediction, and both potency and selectivity (over HIV protease) were achieved. Though the electrondensity map of the complex has not been well defined, the preliminary X-ray structure can be used to support the computer modeling analysis of diol 4 in complexation with the 3CL protease, which indicated that both NHs of the indole rings formed hydrogen bonds with the side chains of Asn142 and His41. We believe that this approach has provided a platform for further lead optimization. Proc. Natl. Acad. Sci Proc. Natl. Acad. Sci This work is supported by the National Science Council, Taiwan and the Genomics Research Center, Academia Sinica. The SYBYL computation was conducted at the National Center for High Performance Computing, Taiwan. The DISCOVERY STUDIO 1.2 SBD computation was conducted at the computational center of the Academia Sinica.Keywords: computer modeling · inhibitors · medicinal chemistry · proteases · structure-based design