key: cord-0692683-rsjq8jo6 authors: Tumskiy, Roman S.; Tumskaia, Anastasiia V. title: Multistep rational molecular design and combined docking for discovery of novel classes of inhibitors of SARS-CoV-2 main protease 3CLpro date: 2021-07-14 journal: Chem Phys Lett DOI: 10.1016/j.cplett.2021.138894 sha: f9b19045b9457d51ef3b9cb633d75c4166d9ff05 doc_id: 692683 cord_uid: rsjq8jo6 The main protease (3CLpro) of SARS-CoV and SARS-CoV-2 is a promising target for discovery of novel antiviral agents. In this paper, new possible inhibitors of 3CLpro with high predicted binding affinity were detected through multistep computer-aided molecular design and bioisosteric replacements. For discovery of prospective 3CLpro binders several virtual ligand libraries were created and combined docking was performed. Moreover, the molecular dynamics simulation was applied for evaluation of protein-ligand complexes stability. Besides, important molecular properties and ADMET pharmacokinetic profiles of possible 3CLpro inhibitors were assessed by in silico prediction. The global pandemic caused by SARS-CoV-2 became a serious challenge for humans. However, there is still no effective approved antiviral drug for the treatment of Covid-19. Coronavirus (CoV) belongs to the family Coronaviridae of the order Nidovirales. Family Coronaviridae is classified into four genera: alpha, beta, delta and gamma among which the representatives of alpha and beta viruses infect humans [1] . Coronavirus is a positive, singlestranded RNA virus that can infect both humans and animals [2] . Human coronavirus is recognized as one of the fastest-evolving viruses from their high element nucleotide replacement rate and recombination [3] . At the last update on 20 April 2021 WHO reported 141057106 coronavirus confirmed cases and 3015043 confirmed deaths in 223 countries [4] . The 3C-chymotrypsin-like protease of coronavirus (3CLpro) is main cysteine protease composed of approximately 300 amino acids and three domains [5] . The 3C-like protease in the active form is a homodimer with a non-classical Cys-His catalytic dyad. This enzyme has a pivotal role for viral genome replication, transcription and other important viral life processes. Inhibition of 3CLpro can effectively block viral RNA transcription and replication. Since humans do not have a homologous protease, the 3CLpro protein is a highly specific antiviral target. Therefore, 3CLpro is a promising target for the development of potential anti-coronavirus drugs. The crystal structure of 3CLpro from SARS-CoV-2 was deposited in the protein database (PDB ID: 7L0D) and revealed a homodimeric form very similar to that of SARS-CoV-1 (PDB ID: 3V3M). They share 294 identical amino acids from 306 residues, and thus their sequence identity is 96%. Active site residues of 3CLpro are involved in substrate binding and formed important S3-S1 ʹ enzyme pockets. Amino acid residues Phe140, Leu141, Asn142, His163, Glu166 form S1 subsite and the Cys145-His41 catalytic dyad together with Glu143 and Ser144 form S1' subsite. At the same time, S2 is formed by Met49, Tyr54, His164, Asp187, Arg188 and S3 pocket is formed by Met165, Leu167, Gln189, Thr190 and Gln192 [6] . Covalent irreversible inhibitors of 3CLpro act via electrophilic attack to the Cys-residue and have significant antiviral activity with relatively long-term effect. However, covalent inhibitors are low selective and have serious adverse effects [7] . ML188 -one of the most effective non-covalent inhibitors of SARS-CoV and SARS-CoV-2 main protease. Moreover, ML188 has demonstrated approximately 2-fold higher inhibitory potential to SARS-CoV-2 as compared to SARS-CoV [8, 9] . The present work is aimed on discovery of novel non-covalent possible SARS-CoV and SARS-CoV-2 3CLpro inhibitors among azachalcones derivatives by using multistep computeraided molecular design with combined docking. The chemistry of chalcones is attractive to researchers because of its simplicity of synthesis and high reactivity, which gives a great possibility for construction of various potential bioactive scaffolds. Many natural and synthetic chalcones have a wide range of biological activities, including anticancer [10] , antimicrobial [11] and other activities. Furthermore, different chalcones have demonstrated wide antiviral activity [12] . Thus, natural chalcones and their synthetic analogs (in particular, azachalcones) are suitable and synthetically available drug-like compounds for the development of novel antiviral agents [13, 14] . Virtual libraries of 3-(het)aryl-1-(pyridine-2-yl)prop-2-en-1-ones (azachalcones) derivatives (in summary, 150 predominantly new drug-like compounds, Table S1 ) were created in ACD/Chemsketch ver. 2019.2.1 [15] and ligands were structurally optimized by using Universal Force Field [16] with steepest descent algorithm (500 steps) in Avogadro ver. 1.2.0 software [17] . All files with ligands (.mol) were converted to corresponding formats (.mol2 and .pdbqt) with addition of ionization and tautometic states at pH 7.4 by using OpenBabel ver. 3.0.0 software [18] . The X-Ray crystal structures of 3C-chymotrypsin-like protease for SARS-CoV (PDB ID: 3V3M) and SARS-CoV-2 (PDB ID: 7L0D) have been taken from the Protein Data Bank at 1.96 and 2.39 Å resolutions, respectively [8, 9] . Inhibitors and solvents were removed from enzymes, protonation and Kollman charges were added (Gasteiger charges were used for ligands). Cocrystallized ligand of 3V3M and 7L0D (inhibitor ML188) was extracted by using iGEMDOCK ver. 2.1 software [19] , it was converted to corresponding formats and used as a control in docking studies. Molecular docking was performed by using CCDC GOLD Suite 5.3 software [20] and a protocol Autogrid-AutoDock Vina on MGL Tools ver. 1.5.6 platform [21, 22] . GOLD (Genetic Optimization for Ligand Docking) is a genetic algorithm for docking flexible ligands into protein binding sites. In this study, GOLD was used only for primary fast docking (calculation of fitness) and identification of prospective 3CLpro binders. GoldScore [23] was selected as a scoring function with maximal search efficiency and number of operations. The default parameters of the automatic settings were used to set the genetic algorithm parameters. For docking the active site of 3CLpro (3V3M) with radius 5 Å (binding area around inhibitor) was used. Input ligands with full protonation were used in .mol2 format. For re-docking and prediction binding affinity of possible 3CLpro inhibitors (with prediction of protein-ligand interactions) a protocol Autogrid-AutoDock Vina was performed. For docking of SARS-CoV (3V3M) and SARS-CoV-2 (7L0D) main protease next parameters were used: number grid points in xyz (40 70 40 for 3V3M; 60 40 60 for 7L0D), spacing (0.775), grid center in xyz (20.597 -30.459 -6.801 for 3V3M; 11.701 -17.509 16.43 for 7L0D). Other parameters were set to default. Input ligands with polar hydrogens were used in .pdbqt format. Results of prediction protein-ligand interactions by docking studies were visualized by using UCSF Chimera ver. 1.14 [24] . 2D-schematic representations of the protein-ligand interactions were generated by using ProteinsPlus [25] . Molecular dynamics (MD) simulation of protein-ligand complexes was performed by using WebGRO and CABS-flex ver. 2.0 [26, 27] . For evaluation of complexes stability two main parameters, such as RMSD (root mean square deviation) and RMSF (root mean square fluctuation) were assessed. RMSD profiles of protein-ligand complexes were evaluated by using WebGRO server. Briefly, the best-docked protein-ligand complexes were prepared for MD by using GROMOS96 43a1 forcefield. The ligand topology was generated by using PRODRG tool [28] . SPC was selected as a solvent model (triclinic water box with size 50x75x70 Å) for protein-ligand complex [29] . This system was neutralized by adding sodium or chlorine ions based on the total charges. For minimization of the system before MD the steepest descent algorithm (5000 steps) was applied. The MD simulations were performed in the presence 0.15 M NaCl using the constant temperature (300 K) and pressure (1.0 bar). Approximate number of frame per simulation was 1000. The simulation time was set to 100 ns. CABS-flex ver. 2.0 was used for evaluation of 3CLpro structure flexibility (RMSF). Number of cycles was set to 50. Cycles between trajectory frames was adjusted to 50. Other parameters were set to default. Molecular properties such as Log P (lipophilicity), TPSA (topological polar surface area), MW (molecular weight), molecular volume, n-ON acceptors (number of hydrogen bond acceptors), n-OHNH donors (number of hydrogen bond donors), n-rotb (number of rotatable bonds) were calculated for 18 novel derivatives of azachalcones as possible 3CLpro inhibitors by using Molinspiration property engine version 2018.10 [30] . In silico pharmacokinetic and toxicity parameters (ADMET profile) such as %ABSi (percentage of intestinal absorption in human), P-gp (P-glycoprotein) and cytochrome P-450 isoforms (CYP) substrate and inhibitory potential, logBB (logarithm of permeability in bloodbrain barrier), logPS (blood-brain permeability-surface area product), potential as substrate of renal organic cation transporter 2 (OCT2), AMES toxicity and inhibition of human ether-a-go-gorelated gene (hERG) channels were predicted by using the pkCSM platform [31] . For discovery of novel possible non-covalent 3CLpro inhibitors we applied a multistep computer-aided molecular design with combined docking (GoldScore with AutoDock Vina). Our combined docking is very suitable and useful method for discovery of possible inhibitors of proteins. Present approach includes of using consistently GoldScore function (primary docking and selection of potential enzyme binders from ligand libraries) and AutoDock Vina for re-docking of GoldScore selected possible inhibitors with binding affinity calculation. In fact, this combined docking is able significantly to reduce the time for identification of prospective hit compounds from massive ligand libraries. Furthermore, GoldScore fitness has sufficiently good agreement with AutoDock Vina binding affinity (Fig. 1) . In the first step of molecular design of 3CLpro inhibitor, a small virtual library of azachalcones (about twenty ligands) was created for primary fast docking (GoldScore) and identification of prospective binders of SARS-CoV 3CLpro (structures, SMILES and calculated fitness for all ligands are reported in the Supplementary data). GoldScore scoring function was selected for primary docking, because this function has good correlation with affinity [32] . Molecular docking of azachalcones library with the active site of 3CLpro (3V3M) was performed and hit compound 9 was identified (Fig. 2, Table S1 ). Calculated fitness for compound 9 was comparable with fitness of 3CLpro inhibitor ML188. At the same time, the hit 9 has a less binding affinity for SARS-CoV main protease (ΔG is -7.3 kcal/mol) in comparison with ML188. Importantly, that azachalcone 9 is well accommodated in the active site of 3CLpro and the 2-pyridyl group with 4-benzyloxyphenyl moiety of 9 occupies the S2-S3 pockets of protein. Furthermore, the 2-pyridyl group of 9 is able to interact with the catalytic dyad of enzyme (π-π stacking with His41). However, this ligand does not show the occupation of important S1 and S1' pockets of protease. Moreover, by docking studies, the compound 9 has not any hydrogen bonds within active site of 3CLpro, that possible explains a lower predicted affinity of azachalcone 9 in comparison with ML188. In the next stages of molecular design, the chemical structure of hit compound 9 was modified for discovery of 3CLpro possible inhibitors with enhanced binding affinity. Based on azachalcone 9, the libraries of novel compounds were created by using computer-aided molecular design with classical and non-classical bioisosteric replacements (SwissBioisostere database) [33] . After molecular docking of these libraries, 18 novel derivatives of azachalcones were identified as possible 3CLpro binders using GoldScore function. AutoDock Vina was used for re-docking of potential inhibitors and calculation of their binding affinity toward SARS-CoV and SARS-CoV-2 main protease (Table 1) . [8] . In fact, the discovered possible 3CLpro inhibitors may be presented as two main classes (A and B) of chemical compounds (Fig. 2) . Class A (in particular, compounds 84 and 146) is substituted 3-[2-(1,2,4-oxadiazol-3-yl)phenyl]-1-(pyridin-2-yl)prop-2-en-1-ones. These structures were developed as prospective 3CLpro inhibitors by generating classical and non-classical bioisosteres based on hit compound 9 by using SwissBioisostere database. The benzyloxy moiety of hit 9 was chosen for selection of bioisosteres. Thus, the 3-substituted oxazole ring was selected (among ~1500 approved bioisosteres) as a bioisostere of the benzyloxy moiety of hit compound 9. This replacement leads to decrease of high lipophilicity of 9 with increase of hydrogen bond formation ability. For further modifications (to improve binding affinity and potential ADMET profile) the oxazole ring was converted to oxadiazole moiety with different aromatic or heteroaromatic substituents. As a result, possible 3CLpro inhibitors 84 and 146 were discovered. Compounds 84 and 146 as lead compounds have the best inhibitory potential in silico for 3CLpro among all ligands of libraries (Table 1 ). These compounds have demonstrated significantly higher binding affinity as compared to ML188. By docking data, ligands 84 and 146 are well accommodated within the active site of 3CLpro with good occupation of S1-S3 pockets (Fig. 3, 4) . In fact, these compounds have very similar spatial location in the active site of enzyme. The substituted oxadiazole ring of 84 and 146 occupy the important S1 and S3 pockets. Highly lipophilic groups of 84 and 146 (trifluoromethylphenyl and tert-butylphenyl, respectively) are well located for the occupation of deep S2 enzyme pocket. Unfortunately, ligands 84 and 146 do not fully occupy the S1ʹ pocket. Possibly, the picolinoyl group of ligands needs to be replace with more hydrophobic substituent for full occupation of the S1ʹ pocket. By docking studies, prospective lead compounds 84 and 146 have few key hydrogen bonds within active site of 3CLpro (Fig. 4A, B) . In the case of 84, the ligand is able to interact with backbone of Gly143 (2.25 Å) to form hydrogen bond. In the case of 146, the nitrogen of the pyridine ring and the carbonyl group of the picolinoyl moiety can interact through hydrogen bonds with the side chain of His163 (1.91 Å) and Asn142 (2.32 Å), respectively. Moreover, this ligand is able to form hydrogen bond with backbone of Gly143 (2.01 Å). Interestingly, that the predicted interactions of 146 (hydrogen bonds) in the active site of enzyme are similar to ML188 interactions. Moreover, a spatial location of tert-butylphenyl moiety and 3-pyridyl group is identical for 146 and ML188. Importantly, note that AutoDock Vina and ProteinsPlus are detected hydrogen bonds of ligands 84 and 146 (the oxygen atom of oxadiazole ring) with backbone of Glu166 (Fig. 4A, B) . However, despite this programs predict such interaction, in fact its real contribution would be very low and the geometry would be significant different (much longer distance between H and O). Class B (in particular, compound 113) is para-substituted 3-phenyl-3-(6-[methoxypyridin-4-yl]pyridin-3-yl)-1-(pyridin-2-yl)prop-2-en-1-ones. The lead compound 113 was discovered by using transformation hits 9 and 64 to new hybrid-structure 113 (Fig. 2) . Interestingly, that this compound has higher calculated binding affinity for SARS-CoV-2 (ΔG is -9.0 kcal/mol) as compared to ML188. Protein-ligand interactions and predicted spatial location of 113 are shown in Fig. 5 . The lead compound 113 well occupies S1-S3 pockets of SARS-CoV-2 3CLpro active site, but this ligand hasn't an ideal spatial geometry for full occupation of hydrophobic S1ʹ pocket. By docking data, the carbonyl group of 113 (4-benzoyl moiety) has hydrogen bond with NH-group of Glu166 backbone (1.91 Å). Moreover, the nitrogen of the pyridine ring can interact with OH-group of Thr25 side chain (3.21 Å) to form weak hydrogen bond. Furthermore, the 2-pyridyl group of ligand interacts with the catalytic residue His41 through π-stacking (T-shaped). For evaluation of protein-ligand complexes stability the MD simulations were performed. Besides, the MD simulations are able to evaluate of ligand-inducing changes in the protein structure. The RMSD profiles of the docked complexes were generated for the best 3CLpro binders 84, 113 and 146 by using WebGRO server (Fig. 6A, B) . The RMSD profiles of co-crystallized inhibitor of main protease (ML188) were used as a control for comparison with novel possible 3CLpro binders. As shown in Fig. 6A , protein complex with ligand 84 is quite rigid with RMSD <3.5 Å and it has similar trend (after 80 ns) as ML188 complex for the final stage of dynamics. At the same time, 3CLpro complex with compound 113 was seen to have flexible nature at the initial phase of simulation, it reached at stable state after about 40 ns. Moreover, protease complex with 146 (the MD video for 3CLpro complex with 146 is available in the Supplementary data) has demonstrated more flexibility in comparison with 84 and ML188 complexes. From Fig. 6B , it is observed that ligands 113 and 146 have lower RMSD as compare to compounds 84 and ML188. Therefore, 3CLpro binders 113 and 146 are well located in the active site of enzyme according to molecular dynamics studies. At the same time, ligand 84 has high RMSD (~ 6 Å with stable state after 50 ns) and its RMSD profile is similar to RMSD profile of ML188. However, compound 84 is still located near the Cys145-His41 catalytic dyad and it forms hydrogen bonds with residues Cys145 and Glu166 according to the last snapshot of MD simulations (data not shown). The RMSF profiles of 3CLpro complexes with compounds 84, 113 and 146 were generated by using CABS-flex (Fig. 7) . CABS-flex is efficient instrument for fast simulations of protein residues flexibility, which has good correlation with protein flexibility data by NMR-spectroscopy [34] . As seen from Fig. 7 , the complex of protein with 113 exhibited high fluctuations for Glu47, Asp48, Met49, Leu50, Asn51 and Pro52. In the case of 113, residues Ile106 and Pro108 have demonstrated more flexibility in comparison with other complexes. Thus, the complex of 3CLpro with compound 113 possibly has a less stability as compare to complexes of 84 and 146. In the cases of 84 and 146, the complexes didn't exhibit high deviations (RMSF < 3.5 Å). The complex of protein with ligand 84 has flexible residues Glu47, Asp187, Arg188, Gln189, Gly275 and Thr304 with acceptable RMSF values. The most stable complex of 146 demonstrated flexibility only for Ser46, Glu47 and Asp48 residues. In conclusion, the RMSD and RMSF profiles are indicated the acceptable ligand-inducing changes in the protein structure and good stability of the 3CLpro complexes with compounds 84, 113 and 146. For prediction of bioavailability and molecular properties of 18 novel azachalcones derivatives (include lead compounds 84, 113 and 146) as potential 3CLpro binders were calculated (Molinspiration) and evaluated using fundamental Lipinski's and Veber's criteria [35, 36] . All compounds satisfy to Lipinski's and Veber's criteria (Table S2) with Log P (3.11 to 5.00), TPSA (64.98 to 106.19), MW (336.39 to 497.55), molecular volume (308.94 to 448.38), n-ON acceptors (4 to 7), n-OHNH donors (0 to 3) and n-rotb (4 to 9). In addition, the ADMET profiles of 18 possible 3CLpro inhibitors were predicted by using the pkCSM platform (Table S3) . For all tested compounds high intestinal absorption in human (%ABSi from 86 to 100) was predicted. Moreover, almost all compounds can be potential Pglycoprotein (P-gр I and P-gр II) inhibitors and some have substrate and inhibitory potential against cytochrome P-450 isoforms. Predicted values of logBB and logPS showed very low ability of these compounds to penetrate the central nervous system. Therefore, almost all tested compounds (include 84, 113 and 146) potentially have good oral bioavailability and ADMET profiles. For the first time, multistep computer-aided molecular design with combined docking (GoldScore with AutoDock Vina) were performed for discovery of novel possible inhibitors of SARS-CoV and SARS-CoV-2 main protease. Based on multistep rational design, several ligand libraries (150 predominantly new drug-like azachalcones derivatives) were created by using classical and non-classical bioisosteric replacements. After combined docking of these libraries, the 18 new possible 3CLpro inhibitors were identified. By the molecular dynamic simulations data, complexes of 3CLpro with the most prospective novel inhibitors (compounds 84, 113 and 146) are well stable and have the acceptable ligand-inducing changes in the protein structure. Moreover, the perspective ligands 84, 113 and 146 as lead compounds have potential good bioavailability and ADMET pharmacokinetic profiles. Thus, these synthetically available compounds are suitable scaffolds for various structural modifications and discovery of new antiviral drugs for the treatment of SARS-CoV-2. Molecular dynamics simulation was used for evaluation of protein-ligand complexes stability 4. ADMET pharmacokinetic profiles were assessed by in silico prediction Tumskiy: Conceptualization, Data curation, Formal analysis, Methodology, Writingoriginal draft Targeting SARS-CoV-2: A systematic drug repurposing approach to identify promising inhibitors against 3C-like proteinase and 2′-O-ribose methyltransferase The molecular biology of coronaviruses SARS and MERS: recent insights into emerging coronaviruses Identification of novel inhibitors of the SARS coronavirus main protease 3CLpro AI-aided design of novel targeted covalent inhibitors against SARS-CoV-2 Identification of potential binders of the main protease 3CLpro of the COVID-19 via structure-based ligand design and molecular modeling Discovery, synthesis, and structure-based optimization of a series of N-(tert-butyl)-2-(N-arylamido)-2-(pyridin-3-yl) acetamides (ML188) as potent noncovalent small molecule inhibitors of the severe acute respiratory syndrome coronavirus (SARS-CoV) 3CL protease Crystal Structure of SARS-CoV-2 Main Protease in Complex with the Non-Covalent Inhibitor ML188 Anti-cancer chalcones: Structural and molecular target perspectives Green synthesis, characterization and biological evaluation of novel chalcones as antibacterial agents A comprehensive review on the antiviral activities of chalcones In silico study of the potential interactions of 4′-acetamidechalcones with protein targets in SARS-CoV-2 In silico pharmacokinetic and molecular docking studies of natural flavonoids and synthetic indole chalcones against essential proteins of SARS-CoV-2 Advanced Chemistry Development Application of a universal force field to organic molecules Avogadro: an advanced semantic chemical editor, visualization, and analysis platform Open babel: an open chemical toolbox iGEMDOCK: a graphical environment of enhancing GEMDOCK using pharmacological interactions and post-screening analysis Protein-ligand docking and virtual screening with GOLD. In Virtual screening in drug discovery AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility Conformational landscapes of membrane proteins delineated by enhanced sampling molecular dynamics simulations UCSF Chimera -A Visualization System for Exploratory Research and Analysis Molecular Complexes at a Glance: Automated Generation of two-dimensional Complex Diagrams CABS-flex 2.0: a web server for fast simulations of flexibility of protein structures PRODRG: a tool for high-throughput crystallography of protein-ligand complexes The missing term in effective pair potentials Molinspiration Cheminformatics free web services pkCSM: predicting small-molecule pharmacokinetic and toxicity properties using graph-based signatures Improved protein-ligand docking using GOLD SwissBioisostere: a database of molecular replacements for ligand design CABS-flex predictions of protein flexibility compared with NMR ensembles Drug-like properties and the causes of poor solubility and poor permeability Molecular properties that influence the oral bioavailability of drug candidates