key: cord-0214222-y23pw0ln authors: Debroise, Th'eau; Hoste, Rose; Chamayou, Quentin; Minoux, Herv'e; Filoche-Romm'e, Bruno; Bianciotto, Marc; Rameau, Jean-Philippe; Schio, Laurent; Levesque, Maximilien title: In silico drug repositioning for COVID-19 using absolute binding free energy calculations date: 2021-09-08 journal: nan DOI: nan sha: 6b573b50fbcc4210c571e5ada1052ef49e060440 doc_id: 214222 cord_uid: y23pw0ln Since the rise of the SARS-CoV-2 pandemic in the winter of 2019, the need for an affordable and efficient drug has not yet been met. Leveraging its unique, fast and precise binding free energy prediction technology, Aqemia screened and ranked FDA-approved molecules against the 3ClPro protein. This protease is key to the post-translational modification of two polyproteins produced by the viral genome. We propose in our top 10 predicted molecules some drugs or prodrugs that could be repurposed and used in the treatment of COVID cases. The worldwide spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) occurred over the course of a few weeks. The first reports of cases in the province of Wuhan, China, started in December 2019 and the World Health Organization declared it a pandemic only four months later. As of December 2020, more than 202M people have been infected and 4.29M died. 1 Given its scale, its rapid spread and high mortality rate, there is an urgent need to identify small molecule drugs (SMD) that can cure contaminated people with severe cases. It is important to note that even once vaccines are available and abate the pandemic, SMDs are still relevant for the people that cannot avoid contamination: no vaccine has 100% efficiency and it is impossible to vaccinate 100% of the world population. As essential it is to find a drug against the coronavirus-disease 2019 (Covid- 19) , it remains a lengthy, costly and possibly unsuccessful process. The average cost of bringing a new compound to the market nearly doubled between 2003 and 2013 to $2.6 billion, according to the Tufts Center for the Study of Drug Development. 2 These same challenges have increased the lab-to-market timeline to 12 years, with 90 percent of drugs washing out in one of the phases of clinical trials. 2 Even if those numbers are discussed in their values, the orders of magnitude are well agreed upon and the standard drug discovery workflow is inadequate to face this challenge in an accelerated time frame. Alternatively, a drug repositioning strategy 3, 4 can be used to accelerate the process by identifying approved drugs as well as clinically tested drug candidates which have the potential to treat COVID- 19 . In general, the drugs already in use were deemed safe in prior studies and have an established large-scale industrial production. Finding one of these molecules with antiviral activity against coronavirus would be a solution to rapidly deal with the emergency of COVID-19. A large scale screening approach leads to a more exhaustive search and possibly, unexpected discoveries compared to a more classical approach. These screenings of molecules that are already approved can be applied both in wet-lab (in vitro) or virtually (in silico) for evaluating the ability of known drugs to bind against the essential proteins of SARS-CoV-2 and hence identify those with potential antiviral activity. In vitro screenings ask for the use of either the purified target protein or cells infected with the SARS-CoV-2. On one hand, the purified protein is easy enough to manipulate, but it is only a proxy to what happens in the virus in the much more complex media that is the human body. On the other hand, live viruses are complex to manipulate and require specific labs and security measures. Due to this limitation and the lower cost associated with it, in silico screenings have gathered the attention of the scientific community. However, most virtual screening suffers from inaccurate protein-ligand affinity predictions. A common source of the prediction error comes from fast empirical scoring functions used for docking. Physics-based scoring functions should be more accurate but they become expensive when entropic contributions are fully taken into account like in absolute binding free energy calculations (ABFE). To bypass this trade-off between accuracy and speed, we have leveraged our fast and precise affinity predictor to screen and rank a library of 1400 FDA-approved drugs in only a few minutes. Our predictor is a free energy calculation which can be expressed as follow : where is a Molecular Mechanic (MM) calculation summing electrostatic and Van ∆ der Waals contributions to protein-ligand interaction and is the difference of ∆ solvation energy between the complex (PL), the protein (P) and the ligand (L). Each of these solvation calculations is based on the molecular density functional theory (MDFT) 5 . The SARS-CoV-2 genome is made up of 30,000 nucleotides, coding for 28 proteins. 6 Of these 28 proteins, 16 non-structural proteins come from two large polyproteins, pp1a and pp1b. These proteins, responsible for example of the shutdown of the cell defences against viruses 7 , are then cleaved by two viral proteases. These two proteins are thus responsible for the maturation of the majority of the machinery of SARS-CoV-2. The first one, papain-like protease (PLPro) is a cysteine protease with a zinc finger domain. 8 The second one, known as the main protease (Mpro) 9,10 , also called 3CLPro is a key protein in the activity of viral replication as it is responsible for eleven of the sixteen proteolytic cleavages. 11 Its predominant role in the virus life cycle and the lack of closely related homologues in humans 12 point out 3CLPro as an attractive target for specific inhibitor design with very less chance of off-targeting. The targeting of viral proteases is a well-known and efficient strategy, which has even been employed in the treatment of HIV and hepatitis. 13 Research groups sought out to exploit the large homology between the 3CLPro of SARS-CoV-1 and SARS-CoV-2 -with more than 96% matching sequence -to accelerate the design of protease inhibitors by transferring knowledge they already had of the 2003 SARS-CoV-1 virus. 11 The catalytic site of both proteins is situated in the groove formed between the first and second domain of the protein, the third and final one being essential for the formation of a homodimer. The twelve mutations between the SARS-CoV-1 and SARS-CoV-2 protease only influence the affinity of the protease for itself, leading to a more stable homodimer in the case of SARS-CoV-2 and subsequently, a better activity. 17 Usual drugs for the SARS-CoV-1 main protease are covalent inhibitors which take advantage of residue reactivity on the main cysteine to anchor themselves. These inhibitors are usually made up of two main sub-structures : a warhead, that will chemically react with the cysteine of the catalytic dyad, and a recognition site that will interact with the sub-pockets prior to the chemical reaction. 18 In Figure 2 , sub-pockets of the binding site are depicted. They are conserved between coronaviruses 19 and each pocket is used to recognize an amino acid of the peptidic substrate. 20 Sub-pocket 2 (S2) and sub-pocket 4 (S4) both recognize small hydrophobic residues, such as leucine, valine and alanine. Sub-pocket 3 (S3) is less specific in the residue it can accommodate, which ranges from lysine to methionine. Sub-pocket 1 (S1) is, in contrast, highly specific as it only recognizes glutamine residue. There is no known protease in the human proteome with the same specificity. [21] [22] [23] A molecule that binds to these sub-pockets with a strong affinity might be a good candidate for further investigations in the treatment of COVID-19. In addition to the elucidation of the mechanism of the hydrolysis of the peptide bond, several research groups have done analyses of the energetic contribution of specific residues. These analyses broke down the interaction of an inhibitor or substrate in a per-residue basis. 16, [24] [25] [26] Statistical analyses over a large dataset of crystallographic structures are also reported. 27 These studies showed that a few key residues, like Asn 142, Glu 166, Met 165, Glu 189 and Thr 26 also have a big impact on the affinity of the inhibitor while not participating in the proteolytic mechanism. These residues take part in all subpockets of the active site, showing that an inhibitor should extend throughout all sub-pockets to have the best binding affinity possible. The affinity prediction methodology developed and applied to obtain the following results is based on the scoring of protein-ligand complexes. Before scoring such complexes, we need to generate a bound pose of the ligand inside the protein structure. Rigid docking is widely deployed to yield such a complex without taking into account the flexibility of the receptor. In order to do so, we decided to use an ensemble docking strategy. Ensemble docking is now well established in the field of early-stage drug discovery and allows one to dock each ligand against multiple rigid receptor conformations. By doing ensemble docking we have a better chance to predict an accurate ligand pose. The "ensemble" of receptor conformations can be obtained by using molecular dynamics simulations. On March 27th, 2020 the D. E. Shaw Research Group released a 100 µs molecular dynamics simulation 35 starting from the apoenzyme structure and was determined by X-ray crystallography (PDB ID: 6Y84). We have clustered this trajectory with GROMOS RMSD-based cluster algorithm 36 . The RMSD calculation was done on backbone atoms of the protein pocket and the protein pocket was defined by atoms within 10 Å of ligand N3 in protein 6LU7 (Figure 2) . A cut-off of 2.2 Å was used to create clusters. It was chosen in order to obtain 90% of the trajectory in the first 10 clusters. Finally centroïds of these 10 clusters were picked as inputs for our ensemble docking protocol (Figure 3 ). The preparation of the protein includes the critical step of protonating the crystallographic structure. This influences the orientation and reactivity of residues such as histidines, glutamines and asparagines. As the catalytic dyad of the 3CLPro protein includes a histidine residue, this step is more important than ever. In order to be reactive, the catalytic cysteine needs to be able to go to its thiolate form (see Annex 3 for the full mechanism). To do so, the proton of the thiol is engaged in a hydrogen bond with the ε nitrogen of the catalytic histidine, meaning that its proton has to be on the δ nitrogen. Other residues with ambiguous protonation or topology were left as is from the molecular dynamics run after checking that they all made sense given the local environment of the residues. We have prepared a database comprising FDA approved drugs and prodrugs libraries 37 as well as known HIV and HCV proteases inhibitors 38 Non-specific interacting poses were filtered out using the contacts they made or did not make with specific residues in the binding pocket. We evaluated in greater detail the molecule's poses that interact with Cys 145, Ser 144 and G143 for their role in the catalytic process. We also chose Asn 142, Glu 166, Met 165 and Gln 189 due to their high prevalence in the energetic contribution of numerous inhibitors, as was said before. Docking poses that passed our custom interactions filter were then rescored with our affinity predictor. The protein-ligand complex structures were prepared with the Amber16 tool set 40 . Proteins, ligands and water were represented by ff14SB 41 , GAFF1.81 42 and TIP3P 43 models, respectively. The free energy of binding was then evaluated according to equation 1. Our prediction of binding free energy takes only a few minutes per compound, each running on 2 cpu cores (c5.large AWS EC2). Our docking/scoring protocol is automated and scaled on a dedicated, secured cloud. The docking and screening procedure of the 1400 molecules was completed in 30 minutes wall time, producing 2100+ scores. The first 10 compounds with the highest affinity are ranked in Table 1 . Their ∆ range from -52.2 to -43.1 kcal/mol. Their placement in the distribution can be found ∆ in the supplementary information (Annex 1). All values along with the top 100 ∆ molecules can be found in the supplementary information as well (Annex 2). We decided to present only the first 10 molecules here, but molecules in the top 100 might be interesting to investigate. In the top 10, two are prodrugs -their active species are metabolites -that are unlikely to remain under this chemical form in a living organism and therefore to interact with the protein under these conditions. Still, they do offer insight on pharmacophores that are capable of binding and blocking the activity of 3CLPro. Other HIV-antiviral drugs (Indinavir, Saquinavir, Tipranavir, Amprenavir, Darunavir, Palinavir, Nelfinavir and Mozenavir) are well-ranked (23rd, 28th, 34th, 40st, 55th, 68th, 96th and 100th). Vilazodone and dipyridamole, which are both measured as active against 3CLPro, are also well-ranked (42nd and 45th) in our screening. In this study, we leveraged Aqemia's absolute binding free energy calculation based on the molecular density functional theory to find known drugs with potential inhibition effects on 3CLPro. It is a first step toward the repositioning of available drugs in the treatment of COVID-19. Some of the highlighted drugs have already proven to be effective against SARS-CoV-2, some of these both in-vitro and clinically like Telaprevir, dipyridamole and vilazodone. We highlighted some drugs that are not in clinical trials yet and may help in the treatment of COVID-19. These in silico results need in vitro confirmation. This work was supported by Sanofi. Aqemia is a deeptech startup in drug design. Our ambition is to discover better and more innovative therapeutic molecules faster. Better molecules because our physics-based technology has unparalleled precision. More innovative molecules because we don't rely on past data which allow us to explore a more diverse chemical space far from drugs on the market. Faster because our precision requires less experiments. Aqemia leverages a unique technology based on 8 years of research in quantum and statistical mechanics at Oxford (UK), Cambridge (UK) and École Normale Supérieure (Paris). Our algorithms are able to compute the affinity between molecules as precisely as traditional experiments and 10 000x faster than competition. This technology guides our AI towards the molecule that has maximum affinity for the therapeutic target and validates other biological and physico-chemical properties. Annex 1 : distribution of (kcal/mol) on the ligand database (one pose per compound). ∆ Annex 2 : top 100 best scored molecules coming from our screening. Hunting for New Drugs with AI A Review of Computational Drug Repositioning: Strategies, Approaches, Opportunities, Challenges, and Directions Review of Drug Repositioning Approaches and Resources High-Throughput Free Energies and Water Maps for Drug Discovery by Molecular Density Functional Theory Genome Composition and Divergence of the Novel Coronavirus (2019-NCoV) Originating in China Coronavirus Biology and Replication: Implications for SARS-CoV-2 Severe Acute Respiratory Syndrome Coronavirus Papain-like Protease: Structure of a Viral Deubiquitinating Enzyme A New Coronavirus Associated with Human Respiratory Disease in China A Pneumonia Outbreak Associated with a New Coronavirus of Probable Bat Origin Coronavirus Main Proteinase (3CLpro) Structure: Basis for Design of Anti-SARS Drugs An Overview of Severe Acute Respiratory Syndrome-Coronavirus (SARS-CoV) 3CL Protease Inhibitors: Peptidomimetics and Small Molecule Chemotherapy Proteases and Protease Inhibitors in Infectious Diseases Design and Synthesis of Peptidomimetic Severe Acute Respiratory Syndrome Chymotrypsin-like Protease Inhibitors Synthesis and Evaluation of SARS-CoV and MERS-CoV 3C-like Protease Inhibitors Revealing the Molecular Mechanisms of Proteolysis of SARS-CoV-2 Mpro by QM/MM Computational Methods Targeting the Dimerization of the Main Protease of Coronaviruses: A Potential Broad-Spectrum Therapeutic Strategy Targeting Severe Acute Respiratory Syndrome-Coronavirus (SARS-CoV-1) with Structurally Diverse Inhibitors: A Comprehensive Review Design of Wide-Spectrum Inhibitors Targeting Coronavirus Main Proteases The SARS-CoV-2 Main Protease as Drug Target Crystal Structure of SARS-CoV-2 Main Protease Provides a Basis for Design of Improved α-Ketoamide Inhibitors Hilgenfeld, R. α-Ketoamides as Broad-Spectrum Inhibitors of Coronavirus and Enterovirus Replication: Structure-Based Design, Synthesis, and Activity Assessment Crystallographic Studies on Coronaviral Proteases Enable Antiviral Drug Design Ul-Haq, Z. Identification of Chymotrypsin-like Protease Inhibitors of SARS-CoV-2 via Integrated Computational Approach In-Silico Approach for Identification of Effective and Stable Inhibitors for COVID-19 Main Protease (Mpro) from Flavonoid Based Phytochemical Constituents of Calendula Officinalis Rational Approach toward COVID-19 Main Protease Inhibitors via Molecular Docking, Molecular Dynamics Simulation and Free Energy Calculation Unveiling the Molecular Mechanism of SARS-CoV-2 Main Protease Inhibition from 137 Crystal Structures Using Algebraic Topology and Deep Learning The Resurgence of Covalent Drugs 3CLpro Inhibitors as a Potential Therapeutic Option for COVID-19: Available Evidence and Ongoing Clinical Trials Repurposing Drugs for the Management of COVID-19 Drug Repurposing Screen Identifies Masitinib as a 3CLpro Inhibitor That Blocks Replication of SARS-CoV-2 in Vitro Identification of Potential Molecules against COVID-19 Main Protease through Structure-Guided Virtual Screening Approach Potential Therapeutic Effects of Dipyridamole in the Severely Ill Patients with COVID-19 Molecular Dynamics Simulations Related to SARS-CoV-2 Peptide Folding: When Simulation Meets Experiment Drugs@FDA Data Files DrugBank 5.0: A Major Update to the DrugBank Database AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization and Multithreading Improving the Accuracy of Protein Side Chain and Backbone Parameters from Ff99SB Development and Testing of a General Amber Force Field Comparison of Simple Potential Functions for Simulating Liquid Water Expanding the Frontiers of Existing Antiviral Drugs: Possible Effects of HIV-1 Protease Inhibitors against SARS and Avian Influenza The In-Vitro Effect of Famotidine on Sars-Cov-2 Proteases and Virus Replication Virtual Screening of an FDA Approved Drugs Database on Two COVID-19 Coronavirus Proteins cxcalc dominanttautomerdistribution was/were used for protonation states generation, cxcalc dominanttautomerdistribution RDKit: Open-source cheminformatics