key: cord-0450502-yr5v1q7y authors: Macchiagodena, Marina; Pagliai, Marco; Procacci, Piero title: Inhibition of the Main Protease 3CL-pro of the Coronavirus Disease 19 via Structure-Based Ligand Design and Molecular Modeling date: 2020-02-24 journal: nan DOI: nan sha: 75d6bf54be93f580ea0de46065241dbd6d48b037 doc_id: 450502 cord_uid: yr5v1q7y We have applied a computational strategy, based on the synergy of virtual screening, docking and molecular dynamics techniques, aimed at identifying possible lead compounds for the non-covalent inhibition of the main protease 3CL-pro of the SARS-Cov2 Coronavirus. Based on the recently resolved 6LU7 PDB structure, ligands were generated using a multimodal structure-based design and then optimally docked to the 6LU7 monomer. Docking calculations show that ligand-binding is strikingly similar in SARS-CoV and SARS-CoV2 main proteases, irrespectively of the protonation state of the catalytic CYS-HIS dyad. The most potent docked ligands are found to share a common binding pattern with aromatic moieties connected by rotatable bonds in a pseudo-linear arrangement. Molecular dynamics calculations fully confirm the stability in the 3CL-pro binding pocket of the most potent binder identified by docking, namely a chlorophenyl-pyridyl-carboxamide derivative. from its origin in the Hubei Chinese district to virtually whole China and, as of today, to more than thirty nations in five continents. 1 The new coronavirus, named SARS-CoV2 and believed to have a zoonotic origin, has infected thus far about 80000 people worldwide with nearly 10000 in critical conditions, causing the death of more than 3000 people. The SARS-CoV2's genome 2,3 has a large identity 4 with that of the SARS-CoV whose epidemic started in early in 2003 and ended in the summer of the same year. Most of the Coronaviridae genome encodes two large polyproteins, pp1a and, through ribosomal frameshifting during translation, 5 pp1ab. These polyproteins are cleaved and transformed in mature non-structural proteins (NSPs) by the two proteases 3CL pro (3Clike protease) and PL pro (Papain Like Protease) encoded by the open reading frame 1. 6 NSPs, in turn, play a fundamental role in the transcription/replication during the infection. 5 Targeting these proteases may hence constitute a valid approach for antiviral drug design. The catalytically active 3CL pro is a dimer. Cleavage by 3CL pro occurs at the glutamine residue in the P1 position of the substrate via the protease CYS-HIS dyad in which the cysteine thiol functions as the nucleophile in the proteolytic process. 7 While dimerization is believed to provide a substrate-binding cleft between the two monomers, 8 in the dimer the solvent-exposed CYS-HYS dyads are symmetrically located at the opposite edges the cleft, probably acting independently. 9 As no host-cell proteases are currently known with this specificity, early drug discovery was directed towards the so-called covalent Michael inhibitors, 10 via electrophilic attack to the cysteinate of the 3CL pro dyad. On the other hand, the consensus in drug discovery leads to excluding electrophiles from drug candidates for reasons primarily relating to safety and adverse effects such as allergies, tissue destruction, or carcinogenesis. 11 In spite of the initial effort in developing small-molecule compounds (SMC) with anticoronavirus activity immediately after the SARS outbreak, 12 no anti-viral drug was ever approved or even reached the clinical stage due to a sharp decline in funding of coronavirus research after [2005] [2006] , based on the erroneous conviction by policy-makers and scientists that chance of a repetition of a new zoonotic transmission was extremely unlikely. The most potent non-covalent inhibitor for 3CL pro , ML188, was reported nearly ten years ago 13 with moderate activity in the low micromolar range. 14 According to the latest report of the structure of 3CL pro from SARS-CoV2 15 (PDB code 6LU7) and the available structure of 3CL pro from SARS-CoV, 12 (PDB code 1UK4), the two main proteases differ by only 12 amino acids, with α carbon atoms all lying at least 1 nm away from the 3CL pro active site (see Figure 1a ). The substrate-binding pockets of two Figure 1: a):SARS-CoV2(orange, pdbcode 6LU7) and SARS-CoV (green, pdbcode 1UK4) main proteases. Violet spheres corresponds to the alpha carbons of the 12 differing residues in the two structures. Grey spheres indicate the CYS-HIS dyad b): view of the binding pocket with the main residues in bond representation (green and red for SARS-CoV2 and SARS-CoV, respectively). The shaded region mark the binding site for the substrate coronavirus main proteases are compared in Figure 1b , exhibiting a strikingly high level of alignment of the key residues involved in substrate binding, including the CYS145· · · HIS41 dyad, and HIS163/HIS172/GLU166. The latter residues are believed to provide the opening gate for the substrate in the active state of the protomer. 12 Figure 1 (a,b) strongly suggest that effective non-covalent inhibitors for SARS-CoV and SARS-CoV2 main proteases should share the same structural and chemical features. In order to investigate this matter, we have performed a molecular modeling study on both the 6LU7 and 1UK4 PDB structures. 6LU7 is the monomer of the main protease in the active state with the N3 peptidomimetic inhibitor 15 while 1UK4 is the dimer with the protomer chain A in the active state. 12 The main protease monomer contains three domains. Domains I and II (residues 8-101 and residues 102-184) are made of antiparallel β-barrel structures in a chymotrypsin-like fold responsible for catalysis. 16 The 6LU7 structure was first fed to the PlayMolecule web application 17 using a novel virtual screening technique for the multimodal structure-based ligand design, 18 called Ligand Generative Adversarial Network (LIGANN). Ligands in LIGANN are generated so as to match the shape and chemical attributes of the binding pocket and decoded into a sequence of SMILES enabling directly the structure-based de novo drug design. SMILES codes for ligands were obtained using the default LIGANN values for shapes and channels with the cubic box center set at the midpoint vector connecting the SH and NE atoms of the CYS-HIS dyad in the 6LU7 PDB structure. The PlayMolecule interface delivered 93 optimally fit non-congeneric compounds, spanning a significant portion of the chemical space, whose SMILES and structures are reported in the Supporting Information (SI). Each of these compounds was docked to the 6LU7 and to the 1UK4 structures, using Autodock4 19 with full ligand flexibility. For both structures, the docking was repeated by setting the dyad with the residue in their neutral (CYS-HIS) and charged state (CYS − /HIS + ). Further details on Docking parameters are given in the SI. Results for the binding free energies of the 93 LIGGAN-determined 3CL pro ligands are reported in Figure 2 . Binding free energies are comprised in the range 4-9 kcal/mol and are found to be strongly correlated for the two protonation states of the CYS-HIS dyad. Correlation is still high when ligand binding free energies for the main proteases are compared, confirming that good binders for SARS-CoV are, in general, also good binders for SARS-CoV2 3CL pro . Table 1 of the Supporting Information. The common color-coded z-scale on the right corresponds to the 2D probability. For each of these compounds, using the knowledge-based XLOGP3 methodology, 20 we computed the octanol/water partition coefficient (LogP) to assess the distribution in hydrophobic and cytosolic environments. LogP values range from -0.5 to a maximum of 5 with a number of rotatable bonds from 2 to a maximum of 12. Most of the LIGGAN compounds bear from 2 to 5 H-bond acceptor or donors (see Table 1 of the SI). In Figure 3 Table 1 , the binding free energy data of these five best ligands are shown for both CoV proteases and both protonation states of the catalytic dyad. Inspection of Table 1 confirms that SARS-CoV2 Table 1 we also report the Autodock4-computed binding free energy for ML188. The Autodock4-predicted binding free energy for the association of ML188-SARS-Cov protease is -6.2 and -6.5 kcal/mol for the H-HIS and H-CYS tautomers, not too distant from the experimentally determined value of -8 kcal/mol, hence lending support for the LIGGAN-Autodock4 protocol used in identifying the lead compounds of Table 1 . In order to assess the stability of the 3CL pro -27 association, we have performed extensive molecular dynamics simulations 2,3 of the bound state with explicit solvent. The overall structural information was obtained by combining data from three independent simulations (for a total of about 120 ns), all started from the best docking pose of 27 on the 6LU7 monomeric structure. Further methodological aspects 25 are provided in the Supporting Information. In Figure S1 : 2D-structures for compounds 1 to 20. In box one of the best five binders. Following the two input file used for molecular docking. • Molecular dynamics (MD) simulations were carried out in a cubic box with periodic boundary conditions, whose side-length was chosen so that the minimum distance between protein atoms belonging to neighboring replicas was larger than 14 Å in any direction. The system (protein+compound) was explicitly solvated with the SPC/E 1 water model at the standard density. The starting configuration was generated using GROMACS 2,3 and PrimadORAC. 4 The system was initially minimized at 0 K with a steepest descent procedure and subsequently heated to 298.15 K in an NPT ensemble (P=1 atm) using Berendsen barostat 5 and velocity rescaling algorithm 6 with an integration time step of 0.1 fs and a coupling constant of 0.1 ps for 250 ps. Production run in the NPT ensemble were carried out starting three independent simulations with different initial velocities randomization. Each MD run has been performed for 40 ns (for a total of 120 ns) imposing rigid constraints only on the X-H bonds (with X being any heavy atom) by means of the LINCS algorithm (δt=2.0 fs). 7 Electrostatic interactions were treated by using particle-mesh Ewald (PME) 8 An Interactive Web-Based Dashboard to Track COVID-19 in Real Time Upgrading and Validation of the AMBER Force Field for Histidine and Cysteine Zinc(II)-Binding Residues in Sites with Four Protein Ligands Statistical Mechanics of Ligand-Receptor Noncovalent Association, Revisited: Binding Site and Standard State Volumes in Modern Alchemical Theories The Statistical-Thermodynamic Basis for Computation of Binding Affinities: A Critical Review An Introduction to Best Practices in Free Energy Calculations The Missing Term in Effective Pair Potentials 5: a High-Throughput and Highly Parallel Open Source Molecular Simulation Toolkit GROMACS: Fast, Flexible, and Free A Free Web Interface for the Assignment of Partial Charges, Chemical Topology, and Bonded Parameters in Organic or Drug Molecules Molecular Dynamics with Coupling to an External Bath Canonical Sampling Through Velocity Rescaling LINCS: A Linear Constraint Solver for Molecular Simulations Particle Mesh Ewald: An N log(N) Method for Ewald Sums in Large Systems Ueber die Anwendung des Satzes vom Virial in der Kinetischen Theorie der Gase