key: cord-0793570-gjd6lse6
authors: Mandour, Yasmine M.; Zlotos, Darius P.; Alaraby Salem, M.
title: A multi-stage virtual screening of FDA-approved drugs reveals potential inhibitors of SARS-CoV-2 main protease
date: 2020-10-23
journal: Journal of biomolecular structure & dynamics
DOI: 10.1080/07391102.2020.1837680
sha: 5356c22393d645eb037b36eb34b6fcf7325fa90e
doc_id: 793570
cord_uid: gjd6lse6

Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an ongoing global health emergency. Repurposing of approved pharmaceutical drugs for COVID-19 treatment represents an attractive approach to quickly identify promising drug candidates. SARS-CoV-2 main protease (M(pro)) is responsible for the maturation of viral functional proteins making it a key antiviral target. Based on the recently revealed crystal structures of SARS-CoV-2 M(pro), we herein describe a multi-stage virtual screening protocol including pharmacophore screening, molecular docking and protein-ligand interaction fingerprints (PLIF) post-docking filtration for efficient enrichment of potent SARS-CoV-2 M(pro) inhibitors. Potential hits, along with a cocrystallized control were further studied via molecular dynamics. A 150-ns production trajectory was followed by RMSD, free energy calculation, and H-bond analysis for each compound. The applied virtual screening protocol led to identification of five FDA-approved drugs with promising binding modes to key subsites of the substrate-binding pocket of SARS-CoV-2 M(pro). The identified compounds belong to different pharmaceutical classes, including several protease inhibitors, antineoplastic agents and a natural flavonoid. The drug candidates discovered in this study present a potential extension of the recently reported SARS-CoV-2 M(pro) inhibitors that have been identified using other virtual screening protocols and may be repurposed for COVID-19 treatment.

Coronaviruses (CoV) have been described for more than five decades since the isolation of the prototype murine strain, JHM, which was reported in 1947 (Bailey et al., 1949) . Although numerous studies of their replication mechanism and pathogenesis have been reported since the 1970s, the family of CoVs received much attention when a new human CoV was recognized to be responsible for the highly contagious and fatal disease, severe acute respiratory syndrome (SARS) (Drosten et al., 2003; Rota et al., 2003) .

Coronaviruses belong to the family Coronaviridae, and are composed of an enveloped single-stranded, positive-sense RNA with a genome ranging from 26 to 32 kb in length (Payne, 2017) . They can be classified into four genera: alpha, beta, delta, and gamma, out of which alpha and beta CoVs are known to infect humans (Payne, 2017) . The recent pandemic COVID-19 is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) which belongs to beta CoVs. Xu et al., 2020) . The genomic sequence of SARS-CoV-2 shows 76.5% identity to that of SARS-CoV (Y. Z. Zhang & Holmes, 2020) . COVID-19 is a highly contagious disease resulting in an unprecedented number of infected people. World Health Organization has declared COVID-19 as a public health emergency of international concern (WHO, 2020). Despite the efforts to discover potential therapeutics, there are currently no approved COVID-19 drugs or vaccines. The only exception is remdesvir, an RNA dependent RNA polymerase inhibitor, for which Gilead Science was granted Emergency Use Authorization by the FDA on May, 1 st , 2020 (FDA, 2020) . Repurposing of approved drugs for COVID-19 treatment is an attractive approach to quickly identify promising drug candidates that have been already optimized in terms of pharmacokinetics and limited toxic side effects. Indeed, a recent study demonstrated that each drug from the DrugBank database has on average three different drug targets (J. Wang et al., 2019) .

The COVID-19 virus main protease (M pro ) is a cysteine protease responsible for proteolysis of replicase polyproteins resulting in the formation of various functional proteins which play a pivotal role in mediating viral replication and transcription Hilgenfeld, 2014) .The functional importance of M pro in the viral life cycle, together with the absence of closely related homologues in humans identified M pro as an attractive target for antiviral drugs (Bartlam et al., 2007) . Although numerous reports had been published on SARS-CoV M pro , no protease inhibitor has yet successfully completed a preclinical development program Bartlam et al., 2007; Hilgenfeld, 2014) . Recently, several X-ray crystal structures of SARS-CoV-2 M pro in complex with covalent inhibitors were determined showing high structural similarity to that of SARS-CoV M pro , as expected from the 96% sequence identity (Jin et al., 2020; . These crystal structures have been recently used in several computational drug repurposing studies (Ancy et al., 2020; Aouidate et al., 2020; Arun et al., 2020; Chatterjee et al., 2020; Elmezayen et al., 2020; Khan et al., 2020; Kumar et al., 2020; J. Wang, 2020) . Additional X-ray structure of SARS-CoV-2 M pro in complex with reversible dipeptide inhibitor X77 has been recently published (Mesecar, 2020) . However, to the best of our knowledge, this latter crystal structure has not been used for virtual screening of drug databases.

The present study describes an alternative drug repurposing approach to identify promising inhibitors of SARS-CoV-2 M pro . Based on the crystal structures of SARS-CoV-2 M pro , a highly selective pharmacophore model was built and used along with docking and PLIF post-docking filtration to screen the currently FDA approved drugs. The binding mode of potential inhibitors was determined and examined for the proper orientation of the pharmacophores in the binding site. Binding modes were further analyzed and compared to a cocrystallized non-covalent ligand by molecular dynamics and molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) free energy calculations. The identified molecules provide new chemotypes for the development of SARS-CoV-2 M pro inhibitors and/or may be considered for COVID-19 treatment.

All molecules were prepared in Molecular Operating Environment (MOE) (MOE, 2019) version 2019.01 by washing, partial charges calculation and energy minimization using the MMFF94x force field and a gradient of 0.0001 kcal/ (mol . Å). Multi-conformations of compounds were generated by the stochastic conformational search algorithm implemented in MOE (MOE, 2019) using the following settings: energy window (7 kcal/mol), elimination of duplicate conformer threshold (RMSD, 0.25 Å), total number of iterations (10000 steps), rejection limit (100 steps), MM iteration limit (10000 steps) and maximum conformation limit (10000 conformers).

A structure-based pharmacophore model was constructed based on the two X-ray crystallographic structures of the SARS-CoV-2 M pro complexed with N3 (PDB code: 6LU7) (Jin et al., 2020) and X77 (PDB code: 6W63) (Mesecar, 2020) . In order to get more information about the active site hotspots, the most favorable energetic properties for ligand binding were determined using the interaction potential map in MOE (MOE, 2019) . Interaction energies were calculated using the following chemical probes: aromatic CH group, amide NH group (hydrogen bond donor), carbonyl oxygen atom (hydrogen bond acceptor) and dry (hydrophobic) probe. Regions of favorable interactions were used to define the pharmacophore features. The coordinates of the cocrystallized ligands were used to define the grid box subjected to energy calculation and for pharmacophore feature placement. The generated pharmacophore model was validated by retrospective screening of a test set database composed of 166 molecules. The test set included 12 SARS-CoV M pro inhibitors collected from literature (Barretto et al., 2005; Jacobs et al., 2013; Jain et al., 2004; Shie et al., 2005; Shimamoto et al., 2015; Thanigaimalai, Konno, Yamamoto, Koiwai, Taguchi, Takayama, Yakushiji, Akaji, Kiso, et al., 2013; Turlington et al., 2013; Xue et al., 2008; H. Z. Zhang et al., 2006; Zhu et al., 2011) which were labelled as actives (Table S1 in Supporting Information). The remaining 154 molecules were labelled as inactives, comprising 10 weak inhibitors of SARS-CoV M pro (B egu e & Bonnet-Delpon, 1991; Gelb et al., 1985; Jain et al., 2004; Shao et al., 2008; Shie et al., 2005) and 144 DUD-E (Mysinger et al., 2012) generated decoys. All molecules were prepared as described in the ligand preparation Section 2.1. The test set compounds were mapped to the pharmacophore model using the pharmacophore search protocol available in MOE. The following metrics were used to evaluate the model performance: sensitivity (Se), specificity (Sp) and enrichment factor (EF).

A genetic algorithm based on Cambridge Crystallographic Data Center (CCDC) Genetic Optimization for Ligand Docking (GOLD version 5.8) (Jones et al., 1995; 1997) was employed for molecular docking based on the crystal structure of SARS-CoV-2 M pro complexed with the reversible inhibitor "X77" (PDB code: 6W63) (Mesecar, 2020) . Binding site residues were defined by specifying the crystal structure ligand coordinates and using the default cutoff radius of 6 Å, with the "detect cavity" option enabled. The docking experiments were performed using the goldscore scoring function. The search efficiency of the genetic algorithm was at 200% setting with the receptor kept rigid. Water molecules were kept in the pocket while allowing the ligand to displace them during the docking experiment. For each compound, 50 complexes were generated and clustered based on their RMSD with the threshold set at 1.5 Å using the complete linkage method. The quality of pose prediction was assessed by calculating the heavy atom RMSD between the docked poses and the original PDB coordinates of X77. The resultant docked poses were rescored using DrugScore (DSX) (version 0.9) (Neudert & Klebe, 2011) utilizing the DrugScore PDB potential. All the docked poses were imported into MOE database for calculating their Protein-Ligand Interaction Fingerprints (PLIF) rows which were utilized for generating amino acids interaction fingerprints using eight types of interactions (sidechain hydrogen bonds (donor or acceptor), backbone hydrogen bonds (donor or acceptor), solvent hydrogen bonds (donor or acceptor), ionic interactions, and p interactions). The cavity used for the PLIF analysis consisted of the same set of residues used in the docking experiments. Finally, the resultant docked poses were filtered using a set of reference PLIFs and the docking protocol validated by retrospective screening of the same test set used in pharmacophore validation. Figures were prepared using Pymol (Schrodinger, LLC, 2015) .

The validated pharmacophore model was utilized as a 3 D query for screening the FDA approved drug dataset retrieved from Selleckchem Inc. (WA, USA, http://www.selleckchem. com). The dataset comprised 2684 compounds which were prepared as described in the ligand preparation section and mapped to the pharmacophore model such that hit molecules match all the query features. Using GOLD, the identified hit compounds were docked into the active site of SARS-CoV-2 M pro (PDB code: 6W63) (Mesecar, 2020) using goldscore scoring function. The search efficiency of the genetic algorithm was at 200% setting with the receptor kept rigid. Finally, the docked poses were filtered using the predefined PLIF and their binding modes were visually examined to check their similarity to those of SARS-CoV-2 M pro cocrystallized ligands. The selected protein-drug complexes were further subjected to MD simulations for understanding the structural stability of protein-drug complexes at the long-interval.

Molecular dynamics were performed on 5 of the six most promising hits from our virtual screening in complex with the SARS-CoV-2 M pro ; namely: boceprevir, epirubicin, nelfinavir, rutin and thymopentin. Bortezomib was not considered due to the inherent difficulty of deriving parameters for the boron atom. Topology and coordinate files were prepared via Ambertools 14 (D.A. Case, 2020) . One extra dynamics trajectory was run for the SARS-CoV-2 M pro in complex with the co-crystallized ligand named X77 (PDB code: 6W63) (Mesecar, 2020) , which serves as a control reference. In all trajectories, the dry MOE-prepared protein was used as initial protein coordinates. The starting coordinates for all ligands were provided from the best docked poses. The pdb4amber and reduce programs (Word et al., 1999) implemented in Ambertools (D.A. Case, 2020) where used to convert the files to an AMBER-friendly format. The all-atom force field AMBER ff14SB (Roe & Cheatham, 2013; Turner, 2005) was universally used. As the force field does not contain parameters for the ligand, all ligands were parameterized using ANTECHAMBER (J. Wang et al., 2006) to generate parameters that are consistent with the General Amber Force Field (GAFF) (J. Wang et al., 2004) . The AM1-BCC method was used to assign charges. Crystallographic waters were retained and each protein was solvated with approximately 15192 TIP3P water molecules in a 92.6 Â 69.7 Â 95.4 Å box which extends 12 Å beyond solute in each direction. Sodium ions were added to neutralize the negatively-charged protein, given the positive charge on nelfinavir and thymopentin, and additional 27 NaCl ions were added to reach a salt concentration of 100 mM to resemble cellular conditions.

MD simulations were carried out using the PMEMD.cuda code of the AMBER Molecular Dynamics package (D.A. Case, 2020) following a standard protocol adopted of minimization, heating, density equilibration and production. The AMBER input files are similar to those in the supplementary information of the previous work of Salem and Brown (Alaraby Salem & Brown, 2015) while using 306 as the number of residues (305 protein residues in addition to the ligand) whenever restraints are applied. The trajectory lengths for heating, density equilibration and production were 20 ps, 50 ps and 150 ns, respectively. Extra 100 ns of production where simulated for epirubicin, as explained in the discussion. Prior to heating, we applied two cycles of minimization to each complex. We applied strong restraints on the non-solvent residues, with a force constant of 500 kcal/mol.Å2, in the first minimization run only to relieve unfavorable hydrogen contacts. Heating was employed at 300 K using the Langevin thermostat. Density equilibration and production were conducted at constant pressure (1 atm). Langevin dynamics (Loncharich et al., 1992) were generally employed and the Particle-Mesh Ewald method (Darden et al., 1993) was used to treat long-range electrostatics under periodic boundary conditions. The trajectories were analyzed using CPPTraj (Roe & Cheatham, 2013) . Plots and visual inspection of the trajectories was done using XMgrace (Turner, 2005) and VMD (Humphrey et al., 1996) , respectively.

A total of 1200 snapshots were evenly sampled from the 150-ns production phase of each trajectory; that is, every 125 ps. In the case of epirubicin, the sampling was repeated in the extra 100 ns production phase, at similar intervals. The molecular mechanics Poisson À Boltzmann surface area (MM-PBSA) was used to estimate the binding free energy, as implemented in the MMPBSA.py script available with AMBER (Miller et al., 2012) . The entropy contribution was not calculated as the method is used to mainly compare the ligands binding to the same protein, rather than attempting to predict absolute affinities (Genheden & Ryde, 2015) . Parameters for internal and external dielectric constants were set to 4 and 80, respectively. For comparison, we also calculated free energy using the generalized-Born surface area method (Genheden & Ryde, 2015) . In general, the free energy (G) of the ligand, L, or the protein, P, is computed according to the following equation:

where E bind , E el and E vdW are the bonding, electrostatic and van der Waals terms, respectively, as in standard molecular mechanics. The polar and non-polar contributions to solvation energies are G pol and G np , respectively. In MMPBSA, G pol is obtained by solving the PB equation, while in the case of GBSA, the Generalized-Born model is used. T is the absolute temperature and S is the entropy which was excluded from our calculations as mentioned above. We adopted the one-trajectory approach by computing the trajectory for a solvated complex and then computing the difference in free energy:

where G PL refers to the protein-ligand complex. This difference must lead to the cancellation of E bind from Equation (1).

The two recently resolved X-ray crystallographic structures of SARS-CoV-2 M pro complexed with N3; an irreversible substrate-like inhibitor (PDB code: 6LU7, resolution 2.16 Å) (L. ) and X77; a reversible dipeptide inhibitor (PDB code: 6W63, resolution 2.1 Å) (Mesecar, 2020) were used in this study for inferring chemical information on inhibitors' binding to SARS-CoV-2 main protease. Overlay of the two structures showed high structural similarity with an average all-atom RMSD of 1.29 Å and 0.693 Å for all protein residues and binding site residues; respectively. The substrate-binding site is located in a cleft formed between domains I and III, hosting a catalytic dyad composed of Cys145 and His41. In the peptdiomimetic inhibitor N3 the residues P1'-P5 bind to complementary subsites S1'-S5 in the substrate-binding pocket of SARS-CoV-2 M pro . Moreover, as the main protease proteins of SARS-CoV (PDB code: 1UK4) (Yang et al., 2003) and SARS-CoV-2 (PDB code:6W63) (Mesecar, 2020) show high structural similarity with an average all-atom RMSD of 1.26 Å and 0.698 Å for all protein residues and binding site residues; respectively. Structural information of previously reported SARS-CoV M pro inhibitors was used to guide the discovery of novel SARS-CoV-2 M pro inhibitors in this study

To obtain comprehensive chemical information on inhibitor binding to SARS-CoV-2 M pro , a structure-based pharmacophore model was constructed based on the two X-ray crystallographic structures of M pro complexed with N3 (L. and X77 (Mesecar, 2020) . The most favorable energetic properties for ligand binding were determined using the interaction potential map in MOE (MOE, 2019). The interaction potential is a grid-based approach developed for scanning the protein surface to identify hot spots. This method is based on determination of the chemical nature of the protein's active site by calculating interaction energies with chemical probes of different electronic properties. Regions of favorable interactions represent hotspots and were used as a basis to define the pharmacophore features. The resultant pharmacophore model comprised features representing mainly S1'-S3 subsites. The generated structurebased model included five features, namely two hydrogen bond acceptors (Acc), one hydrogen bond acceptor projection (Acc2), one hydrogen bond donor (Don) and a hydrophobic center (Figure 1(A) ). Overlay of the crystallized coordinates of the substrate-like inhibitor "N3" on the pharmacophore model showed its peptide P1 orientation defined by two features (Figure 1(B) ), namely the hydrogen bond interactions of the side chain of P1 with the imidazole ring of His163 and backbone carbonyl of Phe140 represented by F1:Acc and F2:Don; respectively. This complies with the S1 subsite specificity to P1-Gln, similarly to other previously reported M pro inhibitors where anchoring the ligand to this site appeared essential to block the enzyme's catalytic activity (F. Wang et al., 2016; Xue et al., 2008) . The hydrogen bond acceptor projection feature (F3:Acc2) represents the hydrogen bond interaction of P1' residue of N3 with the backbone amine of Gly163 in S1' subsite. A hydrophobic feature (F4: Hyd) reflects the side chain of P2 fitting in the S2 subsite lined by side chains of His41 and Met49. The S2 subsite in beta coronaviruses is mainly hydrophobic with both aromatic and aliphatic amino acids reported binding to this site (F. Wang et al., 2016) . This was also reflected by the high interaction potentials of the dry probe observed in this site. The last feature is a hydrogen bond acceptor (F5: Acc) accounting for the H-bond between the backbone carbonyl of P3 and the backbone amine of Glu166. Retrospective screening was conducted to evaluate the ability of the pharmacophore model to separate active from inactive compounds. To this end, a small-molecule test set was generated for model validation. Thus, 12 peptide-like M pro inhibitors of diverse structures, that showed high affinity to the protease enzyme and shared a similar binding mode were selected and labeled as actives Jain et al., 2004; Shie et al., 2005; Shimamoto et Table S1 in Supporting Information). The multi-conformation test set compounds were mapped to the pharmacophore model. As shown in Table 1 , the model could retrieve 9 out of 12 known M pro inhibitors (sensitivity ¼ 0.75), exclude 110 out of 154 inactives (specificity ¼ 0.71) and showed a mapping EF of 2.35.

Molecular docking simulation studies were performed using GOLD 5.8 (Cambridge Crystallographic Data Centre, Cambridge, UK) (Jones et al., 1995; 1997) . Only one crystal structure of SARS-CoV-2 M pro -ligand complex was employed for docking as the binding site did not undergo conformational changes upon binding of different inhibitors and thus the crystal structure of SARS-CoV-2 M pro complexed with the reversible inhibitor "X77" (PDB code: 6W63) (Mesecar, 2020) was used. To validate the docking protocol, the cocrystallized ligand was docked into its binding pocket of M pro enzyme. All the resultant poses converged to a binding mode similar to that of the experimentally determined position of X77 with the best ranking pose having a root-mean square deviation (RMSD) value of 0.52 Å (Table S2 , Supporting Information). An overlay of the docked pose and crystal structure of X77 is shown in the Supporting Information ( Figure S1 ). The binding orientation of X77 is overall similar to known covalent substrate-like inhibitors (F. Wang et al., 2016; , where it appeared preferentially occupying the S1 0 -S3 subsites of SARS-CoV-2 M pro enzyme. In this orientation, the cyclohexyl amide occupied S3 subsite, the tert-butyl phenyl group occupied a deep S2 subsite and the imidazole amide and the 3-pyridyl group occupied S1' and S1 subsites; respectively. Docking of the test set compounds was conducted to further assess the ability of the docking protocol to discriminate active from inactive compounds. Goldscore showed poor enrichment performance with an EF at 1% and 5% of the ranked list of 0 and 1.87, respectively. The docked poses were rescored using DSX PDB (Neudert & Klebe, 2011) which did not show any improvement in the EF values compared to Goldscore. Accordingly, the docked poses were assessed using protein-ligand interaction fingerprints (PLIFs) which were previously reported to give better results than standard scoring functions in terms of identifying the correct binding modes of the ligand and recovering active compounds in virtual screening (VS) trials (Anighoro & Bajorath, 2016a , 2016b Da & Kireev, 2014) . Interactions with key residues targeting the enzyme catalytic residues (Cys145 and His 41) in addition to Glu166 located at the center of the substrate-binding site ensures a binding mode accessing the key enzyme subsites. Filtering the docked poses of the test set compounds based on these PLIFs correctly identified 9 out of 12 known M pro inhibitors (sensitivity ¼ 0.75), excluded 103 out of 154 inactives (specificity ¼ 0.67) with a mapping EF of 2.35 (Table 1) . These metrics reflect the individual performance of PLIF filtration which appeared similar to those of the pharmacophore model. Cascading the filters; pharmacophore, docking and PLIF in a multi-stage manner; could decrease the falsepositive rate. After pharmacophore virtual screening, 9 of 12 actives and 110 out of 154 inactives were correctly predicted. Accordingly, a test set composed of 9 actives and 44 inactives was selected as the initial set for docking and PLIF filtration. The multi-level VS showed better enrichment than either of the screening methods alone retrieving all 9 actives (sensitivity ¼ 0.75), correctly excluding 34 inactives (specificity ¼ 0.06) with a significant increase in the mapping EF value of 6.55 (Table 1 ). The need for sequential filtration can be explained by the large variation in the chemical space and size of reported SARS-CoV M pro inhibitors along with the large size of the substrate-binding pocket of M pro enzyme and accordingly was used for prospective screening.

Virtual screening was carried out hierarchically using the structure-based pharmacophore model, molecular docking along with poses selection using the predefined PLIFs and finally, visual inspection of the binding modes of the hit compounds. The FDA approved drug dataset comprising 2684 compounds was retrieved from Selleckchem Inc. (WA, USA, http://www.selleckchem.com). An extensive stochastic conformational search was conducted for all compounds with the resultant conformers screened using the pharmacophore model described earlier narrowing down the database to 158 compounds. The hit compounds were docked into the active site of SARS-CoV-2 M pro (PDB: 6W63) (Mesecar, 2020) using GOLD (Jones et al., 1995; 1997) with filtration of the docked poses based on the predefined PLIFs reducing the number of hits to 53 compounds. The docked poses of these hits were visually examined using the structural information of the binding modes of previous SARS-CoV M pro inhibitors for evaluating their binding modes. Finally, 12 compounds revealed promising binding modes to SARS-CoV-2 M pro enzyme. The selected hits comprised molecules approved by the FDA for one or multiple therapeutic indications as: HIV and HCV infections, antineoplastic agents and natural compounds of multiple indications (Table S3 in the supporting information). Six compounds were particularly interesting because of the high resemblance between their binding modes and those of SARS-CoV-2 M pro crystallized ligands; X77 and N3. With the exception of bortezomib, molecular dynamics simulations were conducted for these promising docking hits. (Figure 2 ).

For the promising docking hits, molecular dynamics simulations were conducted using the AMBER software package (D.A. Case, 2020) . In total, 6 ligands including the co-crystal X77 ligand were studied and the stability of each MD system was explored. Figure 3(A) showed the RMSD fluctuations of the backbone atoms of the receptor along the MD simulation time relative to the initial structure. In each complex, it appears that the receptor reached a stable equilibrium after almost 25 ns. To further understand the dynamics of the backbone atoms, the root mean square fluctuation (RMSF) values were calculated for backbone atoms at each time point of the trajectories. Higher RMSF values indicate greater flexibility during the MD simulation. Low RMSF values were observed for all protein complexes indicating their high stabilities during the entire MD simulation with the overall topology similar to the native fold. (Figure S3 ) The observed RMSD of 3-4 Å is primarily attributed to variations of residues at the termini showing RMSF values > 5 Å. As illustrated in Figure 3 (B) for the RMSD of the six ligands, the co-crystallized X77 showed the best stability followed by rutin and boceprevir with RMSDs around 2 Å which is reasonable for large ligands like SARS-CoV-2 main protease inhibitors. These small RMSD fluctuations add confidence to the binding modes of rutin and boceprevir proposed by GOLD docking experiments. On the other hand, thymopentin showed the least stability as shown by its high RMSD. Visual inspection of the trajectory confirmed that thymopnetin completely detached from the binding site after 50 ns and remains in the solution. Accordingly, thymopentin was not further considered in this study. For the other 4 hits, the average structure of the collected snapshots was generated, and the MD snapshot with the highest structural similarity with this average structure was chosen as the representative conformation. The comparisons between the crystal structure and the representative MD conformations are shown in Figure 4 for Rutin and Boceprevir and in Figure 5 for Nelfinavir and Epirubicin.

Below, we describe in detail the binding poses of the promising hits from our virtual screening, based on collective evidence from docking and molecular dynamics. H-bond analysis has been performed on the production MD trajectory and the results are provided in Table 2 . Flavonoids were previously identified as potential inhibitors of SARS-CoV M pro showing inhibitory activity within the micromolar range (L. Chen et al., 2006; Yi et al., 2004) . The natural flavonoid rutin was among the identified hits maintaining almost the same binding mode as its docked pose throughout the entire simulation (Figure 4(A) ). Its chromone ring binds mainly to S1 and partially occupies S3 forming interactions with Asn142 and Glu166. The rutinose moiety had one of its saccharides binding to S2 and forming H-bonds with Asp187 and His41, which appeared persistent in 93% and 74% of the production MD trajectory snapshots, respectively (see Table 2 and Figure S2 in the supporting information). Finally, the dihydroxy phenyl ring appeared highly flexible throughout the simulation adopting orientations distinct than that observed in the initial docked pose (Figure 4(C) ).

A recent enzymatic study showed boceprevir to inhibit SARS-CoV-2 viral replication by targeting its main protease (Ma et al., 2020) . Interestingly, the binding mode of the synthetic tripeptide HCV NS3/4A protease inhibitor boceprevir mimicked that of X77; occupying the three subsites S1'-S2. Boceprevir's first peptide residue (P1) with its cyclobutyl side chain occupied S1' while its second peptide residue having a cyclopropyl-fused prolyl ring occupied S1 subsite forming Hbonds with the backbone of Gly143 and side chain of Asn142, which appeared persistent in 50% and 27% of the 150-ns MD trajectory snapshots, respectively (Figure 4(D) and Table 2 ). The 3-methyl-L-valyl group of P3 partially occupied S2 while the tertiary butyl group appeared relatively flexible throughout the simulation (Figure 4(B) ).

Among the promising hits was the HIV protease inhibitor nelfinavir. Several HIV protease inhibitors had been Table 2 . H-bond analysis of the 150 ns of the MD trajectory for selected hits identified by our VS protocol. Only H-bonds that are persistent in more than 10% of the snapshots are included. For thymopentin and epirubicin, no H-bonds were persistent more than 10%. For each H-bond, the acceptor residue and atom name are indicated in column 1. The name of the H-atom and the electronegative atom attached to it on the donor residue are given in columns 2 and 3, respectively. The major atoms contributing to H-bonds are mapped in Figure S2 . previously reported to inhibit the replication of SARS CoV (Jenwitheesuk & Samudrala, 2003) and have shown potential in reducing the symptoms of the recent COVID-19 (Z. M. Chen et al., 2020) . Nelfinavir adopted a different conformation throughout the simulation compared to its initial docked pose with a single anchor part represented by its benzoyl group fitting in S1 subsite and forming H-bonds with Glu166 ( Figure 5(A) ). Analysis of the MD trajectory reveals the formation of H-bonds with Glu166 and Gln189 in approximately 50% of the snapshots for each amino acid. On the other hand, the phenylsulfanyl group flipped 180 , fitting in S3 subsite instead of S1' in its initial docked pose. Finally, the octahydroisoquinoline ring appeared rather flexible adopting different orientations throughout the entire simulation ( Figure 5(C) ). The anthracycline antineoplastic agent valrubicin has been recently discovered as possible SARS-CoV-2 M pro inhibitor in a computational drug repurposing study (L. . Our VS protocol identified another anthracycline antineoplastic agent epirubicin. Anthracenes have been previously reported as inhibitors of dengue virus NS2B-NS3 protease, a member of the chymotrypsin family as SARS-CoV-2 M pro (Tomlinson & Watowich, 2011) . Interestingly, MD simulations of epirubicin showed it detaching from the protein and reattaching again, adopting a more solvent-exposed binding mode distinct than that of its initial docked pose ( Figure  5 (B)). The new binding mode showed epirubicin's main tetracene skeleton residing in a deep groove whose base overlap with S2 subsite and lined by side chains of His41 and Met49 and backbone atoms of Gln189 and Thr190. The oxane ring appeared more solvent exposed with its hydroxyl and basic amino groups forming H-bonds with Glu47 ( Figure 5(D) ) which appeared in only 8% of the snapshots.

Although bortezomib was not considered in MD simulations due to the inherent difficulty of deriving parameters for its boron atom, the binding mode of its docked pose was investigated. Interestingly, previous reports showed that boronic acid compounds bind reversibly to SARS-CoV M pro and inhibiting its activity (Bacha et al., 2004) . Moreover , the proteasome inhibitor carfilzomib was recently discovered by virtual screening as the top candidate among all SARS-CoV-2 M pro inhibitor hits (L. . Bortezomib represents a dipeptide boronic acid analogue belonging to the same class of anticancer agents. The docked pose of bortezomib revealed its boronic acid group mimicking the backbone carboxylic acid of P3 and forming a H-bond with Glu166. The pyrazine ring occupies the S1' subsite forming a H-bond with the catalytic Cys145. The deep S2 subsite is completely occupied by bortezomib's phenyl ring forming C-H-p interactions with His41. Finally, the isopropyl group occupied the S1 subsite ( Figure 6 ).

The ligand binding affinity was measured using the MM-PBSA method . Structures were evenly sampled from the 150-ns trajectory of each compound, and, in the case of epirubicin, from the extra 100-ns trajectory. The calculated binding free energies are summarized in Table 3 with its individual terms listed in table S4. The binding affinity of all ligands was compared to that of Figure 6 . Surface representation of the substrate-binding pocket of SARS-CoV-2 M pro (PDB code: 6W63) (Mesecar, 2020) indicating S1' -S3 subsites. Overlay of the docked poses of Bortezomib (magenta) and the crystallized coordinates of X77 (grey) (Mesecar, 2020) . Table 3 . List of MM-PBSA binding free energies (in kcal/mol) for potential inhibitors binding to SARS-CoV-2 main protease. The free energy is calculated over the course of 150 ns MD simulation for all ligands except epirubicin, which detaches and reattaches in this trajectory. The quoted free energy for epirubicin is for the extended 100 ns, after it reattaches at a close-by site in the active pocket. Table 3 , the co-crystallized ligand X77 is expected to have the largest binding affinity (-17.0 kcal/mol), closely followed by Rutin (-14.9 kcal/mol), within the range of error of the calculated standard deviation (3.4 and 4.3 kcal/ mol, respectively). Rutin is followed by Boceprevir and Nelfinavir, with MM-PBSA values of À13.6 and À13.4 kcal/ mol, respectively. Epirubicin is expected to have the lowest affinity (-11.4 kcal/mol). The relative binding affinities of the hit compounds appeared consistent with their observed stabilities throughout the MD simulations. This trend was also observed for free energy calculated by GBSA method, see table S5 in the supplementary information.

A sequential virtual screening protocol including pharmacophore modeling, molecular docking with PLIF post-docking filtration was developed for the SARS-CoV-2 M pro enzyme. Molecular dynamics simulations were performed for 5 promising hits and the cocrystallized compound followed by binding free energy calculations. Several promising known drugs stand out as potential inhibitors of SARS-CoV-2 main protease, including rutin, boceprevir, epirubicin, nelfinavir and bortezomib with binding modes mimicking those of SARS-CoV-2 M pro cocrystallized ligands N3 and X77. Most of the discovered hits belong to pharmaceutical classes that were previously reported to have inhibitory activity on several viruses including SARS-CoV, a highly homologous virus to SARS-CoV-2 (Bacha et al., 2004; L. Chen et al., 2006; Tomlinson & Watowich, 2011; Yi et al., 2004) . The five potential drug candidates discovered in this study are a valuable extension of the recently reported SARS-CoV-2 M pro inhibitors that have been identified using other virtual screening protocols and may be repurposed for COVID-19 treatment. Moreover, our screening protocol proved to be an efficient tool for a possible repurposing of FDA approved drugs towards SARS-CoV-2 main protease and could be further applied on large compound libraries such as ZINC.

Two-photon absorption of fluorescent protein chromophores incorporating non-canonical amino acids: TD-DFT screening and classical dynamics

Coronavirus main proteinase (3CLpro) structure: Basis for design of anti-SARS drugs

Possibility of HIV-1 protease inhibitors-clinical trial drugs as repurposed drugs for SARS-CoV-2 main protease: A molecular docking, molecular dynamics and binding free energy simulation study

Binding mode similarity measures for ranking of docking poses: A case study on the adenosine A2A receptor

Three-dimensional similarity in molecular docking: Prioritizing ligand poses on the basis of experimental binding modes

Identification of a novel dual-target scaffold for 3CLpro and RdRp proteins of SARS-CoV-2 using 3D-similarity search, molecular docking, molecular dynamics and ADMET evaluation

Drug repurposing against SARS-CoV-2 using E-pharmacophore based virtual screening, molecular docking and molecular dynamics with main protease as the target

Identification of novel inhibitors of the SARS coronavirus main protease 3CLpro

A murine virus (Jhm) causing disseminated encephalomyelitis with extensive destruction of myelin: II

The papain-like protease of severe acute respiratory syndrome coronavirus has deubiquitinating activity

Structural proteomics of the SARS coronavirus: A model response to emerging infectious diseases

Preparation of trifluoromethyl ketones and related fluorinated ketones

In silico analysis and identification of promising hits against 2019 novel coronavirus 3C-like main protease enzyme

Binding interaction of quercetin-3-beta-galactoside and its synthetic derivatives with SARS-CoV 3CL(pro): Structure-activity relationship studies reveal salient pharmacophore features

Diagnosis and treatment recommendations for pediatric respiratory infection caused by the 2019 novel coronavirus

Structural protein-ligand interaction fingerprints (SPLIF) for structure-based virtual screening: Method and benchmark study

Particle mesh Ewald: An NÁlog(N) method for Ewald sums in large systems

Identification of a novel coronavirus in patients with severe acute respiratory syndrome

Drug repurposing for coronavirus (COVID-19): In silico screening of known drugs against coronavirus 3CL hydrolase and protease enzymes

Food and Drug Administration (FDA)

Fluoro ketone inhibitors of hydrolytic enzymes

The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities

From SARS to MERS: Crystallographic studies on coronaviral proteases enable antiviral drug design

Free energy calculations by the molecular mechanics poisson-boltzmann surface area method

VMD: Visual molecular dynamics

Discovery, synthesis, and structure-based optimization of a series of N-(tert-butyl)-2-(N-arylamido)-2-(pyridin-3-yl) acetamides (ML188) as potent noncovalent small

Synthesis and evaluation of keto-glutamine analogues as potent inhibitors of severe acute respiratory syndrome 3CLpro

Identifying inhibitors of the SARS coronavirus proteinase

Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors

Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation

Development and validation of a genetic algorithm for flexible docking

Targeting SARS-CoV-2: A systematic drug repurposing approach to identify promising inhibitors against 3C-like proteinase and 2 0 -O-ribose methyltransferase

Promising inhibitors of main protease of novel corona virus to prevent the spread of COVID-19 using docking and molecular dynamics simulation

Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia

Langevin dynamics of peptides: The frictional dependence of isomerization rates of Nacetylalanyl-N'-methylamide

Boceprevir, GC-376, and calpain inhibitors II, XII inhibit SARS-CoV-2 viral replication by targeting the viral main protease

A taxonomically-driven approach to development of potent, broad-spectrum inhibitors of coronavirus main protease including SARS-CoV-2 (COVID-19)

MMPBSA.py: An efficient program for endstate free energy calculations

Chemical Computing Group ULC, 1010 Sherbrooke St. West, Suite #910

Directory of useful decoys, enhanced (DUD-E): Better ligands and decoys for better benchmarking

DSX: A knowledge-based scoring function for the assessment of protein-ligand complexes

Chapter 17-Family coronaviridae

PTRAJ and CPPTRAJ: Software for processing and analysis of molecular dynamics trajectory data

Characterization of a novel coronavirus associated with severe acute respiratory syndrome

The PyMOL Molecular Graphics System

Design, synthesis, and evaluation of trifluoromethyl ketones as inhibitors of SARS-CoV 3CL protease

Discovery of potent anilide inhibitors against the severe acute respiratory syndrome 3CL protease

Fused-ring structure of decahydroisoquinolin as a novel scaffold for SARS 3CL protease inhibitors

Development of potent dipeptide-type SARS-CoV 3CL protease inhibitors with novel P3 scaffolds: Design, synthesis, biological evaluation, and docking studies

Design, synthesis, and biological evaluation of novel dipeptide-type SARS-CoV 3CL protease inhibitors: Structure-activity relationship study

Anthracene-based inhibitors of dengue virus NS2B-NS3 protease

Discovery of N-(benzo[1,2,3]triazol-1-yl)-N-(benzyl)acetamido)phenyl) carboxamides as severe acute respiratory syndrome coronavirus (SARS-CoV) 3CLpro inhibitors: Identification of ML300 and noncovalent nanomolar inhibitors with an induced-fit binding

XMGRACE (Version 5.1.19)

Structure of main protease from human coronavirus NL63: Insights for wide spectrum anti-coronavirus drug design

Fast Identification of possible drug treatment of coronavirus disease-19 (COVID-19) through computational drug repurposing study

Development and testing of druglike screening libraries

Automatic atom type and bond type perception in molecular mechanical calculations

Development and testing of a general amber force field

Asparagine and glutamine: Using hydrogen atom contacts in the choice of side-chain amide orientation

Evolution of the novel coronavirus from the ongoing Wuhan outbreak and modeling of its spike protein for risk of human transmission

Structures of two coronavirus main proteases: implications for substrate binding and antiviral drug design

The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor

Small molecules blocking the entry of severe acute respiratory syndrome coronavirus into host cells

Design and synthesis of dipeptidyl glutaminyl fluoromethyl ketones as potent severe acute respiratory syndrome coronovirus (SARS-CoV) inhibitors

Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved alpha-ketoamide inhibitors

A genomic perspective on the origin and emergence of SARS-CoV-2

Peptide aldehyde inhibitors challenge the substrate specificity of the SARS-coronavirus main protease

This research was enabled in part by support provided by Westgrid (www.westgrid.ca) and Compute/Calcul Canada (www.computecanada. ca). Access to Westgrid was provided through collaboration with Professor Alex Brown, University of Alberta. We also thank Dr. Ahmed Taha Ayoub for helpful discussions and useful insights on the MD part of the study.

The authors declare no conflict of interest.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Yasmine M. Mandour http://orcid.org/0000-0001-7104-3268

All data are within the manuscript and the supplementary materials.