key: cord-024100-lk67yfrp authors: Plewczynski, Dariusz; Hoffmann, Marcin; Von Grotthuss, Marcin; Ginalski, Krzysztof; Rychewski, Leszek title: In Silico Prediction of SARS Protease Inhibitors by Virtual High Throughput Screening date: 2007-04-24 journal: Chem Biol Drug Des DOI: 10.1111/j.1747-0285.2007.00475.x sha: doc_id: 24100 cord_uid: lk67yfrp A structure‐based in silico virtual drug discovery procedure was assessed with severe acute respiratory syndrome coronavirus main protease serving as a case study. First, potential compounds were extracted from protein–ligand complexes selected from Protein Data Bank database based on structural similarity to the target protein. Later, the set of compounds was ranked by docking scores using a Electronic High‐Throughput Screening flexible docking procedure to select the most promising molecules. The set of best performing compounds was then used for similarity search over the 1 million entries in the Ligand.Info Meta‐Database. Selected molecules having close structural relationship to a 2‐methyl‐2,4‐pentanediol may provide candidate lead compounds toward the development of novel allosteric severe acute respiratory syndrome protease inhibitors. The World Health Organization (WHO) has reported over 8000 SARS cases and nearly 800 deaths resulting from the infection with the SARS-associated coronavirus (SARS-CoV; 2). Since about 2003, various SARS-CoV protein targets for drug discovery were identified, including SARS protease, polymerase and helicase (3) . This study describes an in silico method that captures key features of potential inhibitor molecules to provide specificity and address opportunities for chemical biology and drug design (4) . Our approach is based on experimental information contained in publicly available databases, therefore presenting a foundation for experimental validation. Genomic research provides an ever increasing number of potential drug targets. Structural biology allows for intense use of available and experimentally verified, structural data in various computational projects. In this study, we exploited structural homologs of SARS-CoV protease co-crystallized with small molecules to explore opportunities for drug design of potential inhibitors for this therapeutic target enzyme. The crystal structures for all members of Structural Classification of Proteins (SCOP; 5-7), protein family of viral cysteine proteases of trypsin-fold were extracted from the Protein Data Bank (PDB) database (8, 9) . This family is a part of the trypsin-like serine protease superfamily that possesses closed barrel-type structure and consists of two domains of the same Greek-key duplication. The SCOP family of viral cysteine proteases encompasses three different groups of structures. The first group is 3C cysteine protease (picornain 3C) as exemplified by three proteins: human rhinovirus type 2 (1CQQ); human hepatitis A virus (1QA7, 1HAV), and Poliovirus type I (1L1N). The second group is 2A cysteine proteinase as singularly exemplified by protein Human rhinovirus 2 (2HRV). The third group is coronavirus main proteinase (3Cl-pro, putative coronavirus nsp2) as exemplified by four proteins: transmissible gastroenteritis virus (1LVO); transmissible gastroenteritis virus (1P9U); human coronavirus (1P9S); and SARS coronavirus (1Q2W, 1UJ1, 1UK2, 1UK3, 1UK4). In addition, the sequence similarity search was performed against all proteins from PDB database in order to find homologous protein structures not included in a recent version of SCOP database. The list of ligands co-crystallized with cysteine proteases presents a significant chemical diversity and includes peptides, small molecules, and inorganic salts. The ligand structures are summarized in Table 1 and includes their resolution as well as information about location of their binding inside or outside the common active site. Of these ligand structures, two peptides and one small molecule (i.e. chloroacetone) have been reported in the crystal structure of the SARS coronavirus protease (1UK4). Table 1 summarizes the known ligands co-crystalized with SCOP family of viral cysteine proteases of trypsin-fold and deposited in PDB database. Our approach considers such molecules to be relevant candidate lead compounds (or lead fragments) for further chemical modifications (e.g. peptide bond replacement by non-hydrolyzable bioisosteres). It is also noted that these initial lead compounds do not represent a comprehensive list as they are limited to relevant structures deposited in PDB database. Furthermore, some molecules which have been reported with respect to SARS drug discovery (10) (11) (12) (13) (14) (15) (16) (17) were not included in this investigation. The SWISS PDBVIEWER software (18) was used to align analyzed structures in three-dimensional (3D) space. First, we divided all available protein chains into single domains. The domains were then structurally aligned in order to analyze the binding mode of each ligand in their active sites. From aligned structures we extracted the 1UK4 protein active site with all ligands located within it. In our drug design strategy, we used 1UK4 as the template for a flexible docking experiment in order to adjust the conformation of ligands in the new structural context. The target structure with structurally aligned ligands (co-crystallized with structurally homologous proteins) was then subjected to a further analysis in order to address any inconsistencies in the PDB database entries for certain ligands. In some instances, ligands co-crystallized in protein structures deposited in the PDB database were lacking defined atoms or functional groups may have deformed 3D representation. In the case of SARS-CoV mPro enzyme (1UK4), the ligand molecule, chloroacetone, was represented only by the chlorine atom, hence oxygen and carbon atoms needed to be added to obtain a complete molecule. In the same structure (1UK4), a pentapeptide inhibitor was contained in the crystal structure; however, its C-terminus (a carboxylic acid) was deformed to such a degree that the carbon atom had sp 3 instead of sp 2 hybridization. Moreover, the distance between the two oxygen atoms in the terminal carboxylate moiety was only 0.740 and 1.066 for chains G and H, respectively. The original study (19) provides information that one should expect a typical carboxylic moiety at C-terminus of the inhibitory pentapeptide, hence further analysis required prior manually remodeling to reconstitute the desired molecules. We have explored the use of structural information contained in PDB database for an in silico virtual drug discovery campaign using, as a case study, the main protease of SARS-CoV. A similar approach using HIV protease as a case study has been reported in Refs (20, 21) . Two in silico methods were employed to evaluate the gathered structural information from PDB database. The first method was Electronic High-Throughput Screening (eHiTS), an exhaustive flexible docking method that systematically covers the Plewczynski et al. In silico Prediction of SARS Inhibitors Plewczynski et al. In silico Prediction of SARS Inhibitors Plewczynski et al. In silico Prediction of SARS Inhibitors significant part of the conformational and positional search space to produce highly accurate docking poses at a speed practical for virtual high-throughput screening (22) . The second method was the Ligand.Info, a system designed for fast, sensitive, virtual highthroughput screening of small-molecule databases (23) . The Ligand.Info search algorithm is based on two-dimensional structure similarity. The developed system enables search for similar compounds in a large Ligand.Info Meta-Database (24) that contains various publicly available sets of small molecules, including: (i) Harvard's ChemBank, which encompasses bioactive compounds and FDA-approved drugs (25); (ii) ChemPDB -ligands marked as Hetero Atoms in PDB files (26, 27) ; (iii) KEGG Ligand -molecules which are found in the KEGG pathways (28) ; and (iv) the Open National Cancer Institute database (29) . The total size of the Meta-Database exceeded 1 million entries. Using this method, plausible inhibitors were generated as based only on the set of ligands from crystallized complexes of a protein Figure 1 : Two dimensional chemical structures presented in Table II: Plewczynski et al. target and other proteins from its structurally homologous family. The docking was performed on small molecules and short peptides extracted from protein-ligand complexes of the viral cysteine proteases of trypsin-fold. These small molecules and peptides were then modified in order to correct chemical attributes of the ligand structures. This set of inhibitors was next evaluated by the eHiTS flexible docking algorithm. The top-ranked ligands are summarized in Table 2 (see Figure 1 for chemical structures). The best scoring compounds of the first set were the original AG7 ligand (eHiTS docking score of )5.615) and two analogs of AG7 (eHiTS docking scores of )5.103 and )4.803, respectively). The fourth and fifth best scoring compounds were the original peptides of SARS target (PDB code: 1UK4, chains H and G) and reflected improving the initial 3D structure (eHiTS docking scores of )4.795 and )4.703, respectively). In the case of the crystal structure 1UK4, the same short peptide (chains G and H) interacting with the enzyme can be seen. In the crystal structure this peptide is present in two different conformations, and when each are used as staring points the the eHiTS results were determined to be different (cf. peptides type 1 chain H and type 1 chain G in Table 1 ). The same effect was observed for a very simple and flexible molecule, 2-methyl-2,4-pentanediol (MPD), which exists in crystal structures in various conformations and when used as starting points for eHiTS calculations yielded different results. Thus, eHiTS results depend on the choice of a ligand initial conformation for several reasons, including: (i) some parts of the molecule recognized by eHiTS as rigid blocks may not be rigid and should not be treated as rigid; (ii) some flexible parts may not be flexible enough; and (iii) the search may not be adequately comprehensive. We tend to think that the dependence of results on initial conformation stems from the combination of (i) and (iii) as rings within molecules being virtually identical between various poses produced by the program. Unfortunately, such a suggestion could not be confirmed as we could not identify any information relative to any parts of a molecule being treated by eHiTS as rigid blocks. Further, each molecule shown in Table 1 was used for screening the Ligand.Info Meta-Database as a query. Unfortunately, almost all designed potential lead compounds did not have any close analog in any Meta-Database subset. Experimental confirmation will obviously require chemical synthesis and biologic testing. However, MPD allosteric inhibitor was an exception as 21 similar analogs (with MTC ‡ 0.60) were found using MPD as a query. In conclusion, a series of lead compounds as potential SARS protease inhibitors have been preliminarily identified using a structurebased in silico virtual drug discovery approach. However, it is stressed that no MPD analogs have yet been reported to date relative to SARS protease inhibitor drug discovery (10) (11) (12) (13) (14) (15) (16) (17) (30) (31) (32) (33) (34) . Also importantly, MPD is a chemical additive used for crystallization of biologic macromolecules, and it has been determined in co-structures with varying proteins, and not limited to only the SCOP family of viral cysteine proteases of trypsin-fold (35) (36) (37) (38) (39) (40) (41) (42) (43) . Relative to SARS protease, MPD may provides a candidate lead compound (or fragment) for drug discovery. Several of the selected MPD analogs identified in this study are being tested experimentally (SEPSDA Sino-European Commission Project) with respect to their potential SARS protease inhibitory properties. Inhibition, escape, and attenuated growth of severe acute respiratory syndrome coronavirus treated with antisense morpholino oligomers Severe acute respiratory syndrome (SARS) -paradigm of an emerging viral infection From genome to antivirals: SARS as a test tube First volume of Chemical Biology & Drug Design: strategic vision for the advancement of innovative science, technology and medicine SCOP: a Structural Classification of Proteins database for the investigation of sequences and structures SCOP database in 2004: refinements integrate structure and sequence family data SCOP database in 2002: refinements accommodate structural genomics The Protein Data Bank and the challenge of structural genomics The Protein Data Bank From genome to drug lead: identification of a small-molecule inhibitor of the SARS virus Synthesis and activity of an octapeptide inhibitor designed for SARS coronavirus main proteinase. Peptides Structure-based drug design and structural biology study of novel nonpeptide inhibitors of severe acute respiratory syndrome coronavirus main protease Production of Authentic SARS-Cov M(pro) with Enhanced Activity: Application as a Novel Tag-cleavage Endopeptidase for protien Overproduction Design of wide-spectrum inhibitors targeting coronavirus main proteases Synthesis, crystal structure, structure-activity relationships, and antiviral activity of a potent SARS coronavirus 3CL protease inhibitor Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection An episulfide cation (thiiranium ring) trapped in the active site of HAV 3C proteinase inactivated by peptide-based ketone inhibitors SWISS-MODEL and the Swiss-Pdb-Viewer: an environment for comparative protein modeling. Electrophoresis The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor Exploring the stereochemical requirements for protease inhibition by ureidopeptides Conformational studies of irreversible HIV-1 protease inhibitors containing cis-epoxide as an amide isostere Software tools for structure based rational drug design Ligand-Info, searching for similar small compounds using index profiles Ligand.Info small-molecule Meta-Database From knowing to controlling: a path from genomics to drugs using small molecule probes The Protein Data Bank E-MSD: the European Bioinformatics Institute Macromolecular Structure Database LIGAND: database of chemical compounds and reactions in biological pathways Comparison of the NCI open database with seven large chemical structural databases Design and synthesis of peptidomimetic severe acute respiratory syndrome chymotrypsin-like protease inhibitors A new lead for nonpeptidic active-site-directed inhibitors of the severe acute respiratory syndrome coronavirus main protease discovered by a combination of screening and docking methods Synthesis of 1-benzyl-3-(5¢-hydroxymethyl-2¢-furyl)indazole analogues as novel antiplatelet agents Discovery of potent anilide inhibitors against the severe acute respiratory syndrome 3CL protease Small molecules targeting severe acute respiratory syndrome human coronavirus The 1.19 A X-ray structure of 2¢-O-Me(CGCGCG)(2) duplex shows dehydrated RNA with 2-methyl-2,4-pentanediol in the minor groove An overview on 2-methyl-2,4-pentanediol in crystallization and in crystals of biological macromolecules The organic crystallizing agent 2-methyl-2,4-pentanediol reduces DNA curvature by means of structural changes in A-tracts Urinary excretion of 2-methyl-2,3-butanediol and 2,3-pentanediol in patients with disorders of propionate and methylmalonate metabolism Structure of an acidic phospholipase A(2) from the venom of Deinagkistrodon acutus in a new crystal form Three-dimensional structure of a fluorescein-Fab complex crystallized in 2-methyl-2,4-pentanediol Increased temperature and 2-methyl-2,4-pentanediol change the DNA structure of both curved and uncurved adenine/thymine-rich sequences Chiral perturbation factor' approach reveals importance of entropy term in stereocontrol of the 2,4-pentanediol-tethered reaction Purification, crystal growth and preliminary X-ray analysis of a phospholipase A(2) from venom of Agkistrondon acutus