key: cord-0822560-lq1vidqh authors: Gossen, Jonas; Albani, Simone; Hanke, Anton; Joseph, Benjamin P.; Bergh, Cathrine; Kuzikov, Maria; Costanzi, Elisa; Manelfi, Candida; Storici, Paola; Gribbon, Philip; Beccari, Andrea R.; Talarico, Carmine; Spyrakis, Francesca; Lindahl, Erik; Zaliani, Andrea; Carloni, Paolo; Wade, Rebecca C.; Musiani, Francesco; Kokh, Daria B.; Rossetti, Giulia title: A blueprint for high affinity SARS-CoV-2 Mpro inhibitors from activity-based compound library screening guided by analysis of protein dynamics date: 2020-12-18 journal: bioRxiv DOI: 10.1101/2020.12.14.422634 sha: 177dfa090e5e67e0d7252f1826c737ab0ac53e86 doc_id: 822560 cord_uid: lq1vidqh The SARS-CoV-2 coronavirus outbreak continues to spread at a rapid rate worldwide. The main protease (Mpro) is an attractive target for anti-COVID-19 agents. Unfortunately, unexpected difficulties have been encountered in the design of specific inhibitors. Here, by analyzing an ensemble of ~30,000 SARS-CoV-2 Mpro conformations from crystallographic studies and molecular simulations, we show that small structural variations in the binding site dramatically impact ligand binding properties. Hence, traditional druggability indices fail to adequately discriminate between highly and poorly druggable conformations of the binding site. By performing ~200 virtual screenings of compound libraries on selected protein structures, we redefine the protein’s druggability as the consensus chemical space arising from the multiple conformations of the binding site formed upon ligand binding. This procedure revealed a unique SARS-CoV-2 Mpro blueprint that led to a definition of a specific structure-based pharmacophore. The latter explains the poor transferability of potent SARS-CoV Mpro inhibitors to SARS-CoV-2 Mpro, despite the identical sequences of the active sites. Importantly, application of the pharmacophore predicted novel high affinity inhibitors of SARS-CoV-2 Mpro, that were validated by in vitro assays performed here and by a newly solved X-ray crystal structure. These results provide a strong basis for effective rational drug design campaigns against SARS-CoV-2 Mpro and a new computational approach to screen protein targets with malleable binding sites. Mpro, despite the identical sequences of the active sites. Importantly, application of the 1 pharmacophore predicted novel high affinity inhibitors of SARS-CoV-2 Mpro, that were validated 2 by in vitro assays performed here and by a newly solved X-ray crystal structure. These results 3 provide a strong basis for effective rational drug design campaigns against SARS-CoV-2 Mpro 4 and a new computational approach to screen protein targets with malleable binding sites. 5 6 INTRODUCTION 7 In December 2019, a new coronavirus (CoV), belonging to the clade b of the Betacoronavirus 8 viral genus, caused an outbreak of pulmonary disease in the Hubei province in China. 1, 2 In the 9 first months of 2020, the new pandemic spread globally and it is still continuing. The virus shares 10 more than 80% of its genome with that of the SARS coronavirus discovered in 2002 CoV). 1, 2 Hence it has been named severe acute respiratory syndrome-coronavirus 2 (SARS-CoV-12 2) by the International Committee on Taxonomy of Viruses. 13 The most promising tools for the cessation of the epidemic spread of COVID-19 are vaccines. 3 14 The majority of them could trigger an immunogenic response using inactivated viral vectors or 15 RNA sequences encoding SARS-CoV-2 spike glycoprotein. 4 Unfortunately, the SARS-CoV-2 16 spike protein uses conformational masking and glycan shielding to frustrate the immune 17 response, possibly hindering vaccine effectiveness. 5 18 Interfering with viral replication is an alternative and promising strategy of treatment. In this 19 context, the chymotrypsin-like proteinase (often referred to as the main protease, Mpro hereafter) 20 is an excellent pharmaceutical target. 6, 7 It does not depend on host immunogenic responses and 21 it is essential for generating the 16 non-structural proteins, critical to the formation of the replicase 22 complex. 23 Mpros are highly conserved enzymes across CoVs. 8, 9 SARS-CoV Mpro was already suggested 24 as one of the main drug targets in the pandemic associated with that virus, about 15 years ago. 10-25 12 Inhibitors of proteases (e.g. aspartyl protease) are also common drugs used in the clinic against 26 other deadly viruses, e.g. HIV-1. 13 27 The SARS-CoV Mpro active form is a homodimer ( Figure 1A) , with each monomer consisting of 28 N-terminal, catalytic and C-terminal regions 14 ( Figure 1B ). Mpros were shown to process 29 polyproteins on diverse cleavage sites, using a cysteine/histidine catalytic dyad: 15 the histidine 30 (His41 in SARS-CoV-2) forms a hydrogen bond (Hbond) with a water molecule that, in turn, 31 interacts with an aspartate (Asp187) and a histidine (His164) side chain. Asp187 is further 32 stabilized through a salt-bridge with a nearby arginine (Arg40, Figure 1C ). In this way, His41 can 33 act as a base, extracting a proton from the catalytic cysteine (Cys145) sidechain and activating it 34 for the nucleophilic attack that cuts the polypeptide. 35 SARS-CoV-2 Mpro shares 96% sequence identity with Mpro from SARS-CoV (Section S1, Figure 36 S1A). Twelve residues differ between both Mpros and only one, namely, Ser46 in SARS-CoV-2 37 (Ala46 in SARS-CoV), is located at the mouth of the active site cavity ( Figure S1B ). The binding 38 sites share 100% of sequence identity ( Figure S1A ). Thus, exploiting the known libraries of CoV Mpro inhibitors has been a strategy followed by many research groups. Unfortunately, most 40 SARS-CoV Mpro inhibitors with good (nM) activity against SARS-CoV Mpro in vitro and in cell-41 based assays, exhibited limited (sub-µM) potency against the protein from SARS-CoV-2 in 42 enzymatic assays, and low-µM IC50 values (4-5 µM) in cell-based assays. 16 proteases, 24-29 led to clinical advances. Because only 12 residues, far from the binding site, differ 14 between SARS-CoV Mpro and SARS-CoV-2 Mpro, the mutation of distant residues can 15 substantially contribute to the binding site plasticity and to the ligand binding through allosteric 16 regulation. 30 This is both disappointing and puzzling from a pharmaceutical perspective. Recently, 17 however, it has been shown that the dipeptide prodrug GC376, and its parent GC373 inhibit the 18 two proteases with IC50 values in the nanomolar range. 31 This suggests that, despite the intrinsic 19 and significant differences between the two Mpros, common binding features against some 20 classes of high-affinity ligands are retained. 21 Molecular dynamics (MD) simulations 18 provided hints to address this riddle: they showed that the 22 SARS-CoV Mpro active sites display major differences in both shape and size. In particular, while 23 both Mpros reduce their accessible volume upon inhibitor binding by approximately 20%, the 24 maximal volume of the holo SARS-CoV Mpro active site is over 50% larger than that of SARS-25 CoV-2. In addition, the accessibility of the binding hotspots (i.e. the key residues for substrate 26 binding) and the flexibility of one of the two loops delimiting the binding pockets (ASL1 in Figure 27 1B) differs between the two Mpros. 18 The simulations indicate that the binding sites of the two 28 Mpros are dynamically diverse and that ligand binding can impact them differently. 29 Therefore, transferable binding features across Mpros, as well as unique ones for SARS-CoV-2 30 are difficult to predict: the exceptional flexibility and plasticity of the binding site is here coupled 31 with large adjustments of the cavity shape in response to the binding of an inhibitor. This clearly 32 emerges from an analysis of the SARS-CoV-2 Mpro binding pocket's conformational changes 33 (performed here) across the majority of the 196 X-ray crystal structures available in the Protein 34 Data Bank up to September 30 th 2020. 18, [32] [33] [34] This flexibility makes a rational drug-design 35 approach extremely challenging: 18, 35 the screening potential of Mpro conformational space is too 36 large, too flexible, unpredictable, and the actual available binding space can differ significantly 1 from ligand to ligand. 18, 32, 36 2 It is therefore imperative to identify the relationship between SARS-CoV-2 conformational space, 3 flexibility, druggability and ligand binding. Here, we analyzed the mentioned 196 X-ray crystal 4 structures, along with about ~31,000 conformations extracted, not only from the longest (100 µs) 5 MD simulation of SARS-CoV-2 MPro so far 37 , but also from binding site enhanced sampling 6 simulations carried out here. Among these structures, we selected ~24,000 conformations for 7 which we systematically performed druggability analyses on the binding sites. The top 110 8 druggable structures were selected for virtual screening of a sample library of ~12,600 ligands. 9 The latter includes marketed drugs and compounds under development, the internal chemical 10 libraries from Fraunhofer Institute and the Dompè pharma company, as well as the so-far known 11 inhibitors of SARS-CoV Mpro from literature. 12 We redefine here the protein druggability in a new way exploiting the chemical space shaped by 13 the different configurations of the binding site upon virtual screening. Specifically, we identified a 14 consensus protein-ligand interaction fingerprint across the chemical space and the corresponding 15 SARS-CoV-2 Mpro unique structure-based pharmacophore. Our pharmacophore was able to 16 identify known nM-binders (IC50 ≤ 400 nM) of SARS-CoV-2 Mpro and to distinguish those from 17 micromolar inhibitors. The latter was able to identify Myricetin and Benserazide as nM inhibitors, 18 here experimentally validated by enzymatic activity binding assay. The predicted binding of 19 Myricetin is also in striking agreement with the recently solved X-ray crystal structure by some of 20 us. Our pharmacophore also provides a rationale for the variability in ligand affinity of currently 21 known SARS-CoV ligands and for the general lack of transferability of SARS-CoV ligands to 22 SARS-CoV-2, shedding light on the complexity, plasticity and druggability of SARS-CoV-2 Mpro. 23 To understand how Mpro's binding site conformation affects the druggability and the chemical 26 space of selected binders, we start by considering ~30,000 Mpro conformations spanning 27 different binding site arrangements and flexibility obtained from different sources (Table 1 ). These 28 include: 29 (i) All the X-ray crystal structures deposited to date of apo and holo SARS-CoV-2 Mpro. After 30 analyzing all of them, we selected 43 structures for follow-up analysis. These exhibit a RMSD 31 lower than 1 Angstrom with respect to the excluded ones. Therefore, they can be considered a 32 good representative ensemble of the overall deposited SARS-CoV-2 X-ray crystal structures ("X-33 ray ensemble" hereafter, see Table S1 for details). 34 (ii) Two sets of MD ensembles: (a) The "MD10000" ensemble, that is 10,000 frames, taken every 35 1 ns, of a 100 µs-long MD of apo Mpro from Shaw's group 37 and (b) the "MSM ensembles", two 36 collections of 30 and 40 representative conformations extracted from a three-state and a four-37 state Markov state model (MSM), respectively (see Section S2, Figure S2 ). The MSM analysis 38 was performed on the 100 µs-long MD trajectory. These are included to identify conformations 39 representing structural changes potentially related to binding. 40 (iii) About 22,000 conformations obtained with two enhanced sampling methods implemented in 41 the TRAPP web server: 38 MD-based enhanced sampling by Langevin Rotamerically Induced 42 Perturbation (LRIP) 39 and constraint-based sampling by tConcoord 40 as referred to as a "LRIP/tC 1 ensembles" hereafter. These ensembles were derived using the X-ray crystal structures of the 2 protein in complex with the inhibitor N3 (PDB ID 6LU7) and with another α-ketoamide inhibitor 3 (PDB ID 6Y2G). 4 We discuss only these two parameters, because they turn out to be key descriptors of the SARS-10 CoV-2 Mpro active site druggability (see below). The latter was evaluated with the "druggability" 11 scores: SiteScore and Dscore, as derived from the SiteMap tool implemented in the Schrödinger 1 suite 2019-4 (Schrödinger, LLC, New York, NY, 2019) and the CNN and LR (Convolution Neural 2 Network and Linear Regression) druggability models 41 as implemented in the TRAPP package. 42 3 Although TRAPP and SiteMap use different approaches for computing the pocket characteristics 4 (3D grid-based versus residue-based), the trends in the computed parameters are similar. 5 From this analysis, we conclude the following: 6 (i) The binding site volumes computed with TRAPP for the holo X-ray crystal structures are 7 distributed over a slightly larger and more variable range of values than that for the apo X-ray 8 crystal structures (Figure 2A , Figure S3 ). The distribution of volumes is higher in the MD and 9 enhanced sampling simulations: for instance, in the MD10000 ensemble, the volume of the apo 10 protein has a variation of as much as 48% ( Figure 2A ) with respect to the average value. Similar 11 trends were observed for volumes computed with SiteMap (see Table S2 ). This difference in 12 volume distributions between X-ray crystal structures and MD snapshots could be caused by 13 crystal packing. We therefore calculated the minimum distance between two crystallographic 14 symmetry images, as well as the minimum distance between the binding site residues and the 15 nearest image (see Section S3.2). Depending on the space group, the minimum distance between 16 a non-hydrogen atom in the binding site and an image atom turns out to span between ~2.4 Å 17 and ~9.3 Å (Table S3, Figure S5 ). Therefore, crystal packing might constrain the binding site of 18 some of the crystal structures to more compact conformations. 19 During the MD simulations, on the other hand, the large range and variability in binding site 20 volume are associated with conformational changes of loops ASL1 (res. 22-53) and ASL2 (res. 21 184-194) ( Figure 2B ). It can be seen that ASL1 is more flexible than ASL2, from the MSM analysis 22 (Section S2) and by calculation of the residue occurrence in the binding site (Section S3, Figure 23 S3). Volume variability also results from the transient participation (with a frequency of ~25%) of 24 the N-and C-terminal tails of the adjacent subunit in the binding site ( Figure 2C ). During the MD 25 simulation, these two termini move in the proximity of the pocket. For the seven apo X-ray crystal 26 structures, where the termini are resolved (Table S1 ), the C-terminus is never close to the binding 27 sites, whereas the N-terminus is always so (see Figure 2D ). In the holo crystal structures, the N-28 and C-terminal tails of the adjacent subunit can both be found close to the binding site (see Figure 29 2D). A similar scenario is observed for the holo structures in the LRIP/tC ensemble, where both 30 terminals are present in the binding pocket even more often (in about 40% of simulated structures, 31 Figure S3 ). 32 Binding site residues in the 100 µs-long MD trajectory: average occurrence in snapshots at 1 µs intervals for chains A and B 2 separately. Unlike chain A, extensive conformational changes are observed in the loop formed by residues 181-194 of chain 3 B (see Figure S4 ). (D) Average occurrence of the binding site residues in the set of apo and holo X-ray structures (Table S1 ). Residues from the same chain are shown in blue, while residues from the adjacent chain are colored in orange. The 5 percentage of each residue was calculated considering the number of structures for which that residue was resolved. 6 7 (ii) The binding site hydrophobicity is here estimated in terms of hydrophobicity distribution across 8 the different conformational ensembles, calculated with TRAPP. The distribution of the MD 9 ensemble (based on the apo structure) can be fitted with a Gaussian centered around a 10 hydrophobicity ~55 on the TRAPP scale 41 ( Figure S6A ). Similarly, the apo X-ray crystal structures 11 could be fitted with a Gaussian that is shifted to a peak at a higher hydrophobicity of ~65 ( Figure 12 S6A). The distribution of the holo X-ray crystal structures exhibits three peaks: one at lower 13 hydrophobicity (~42-55), one at the apo X-ray crystal structures hydrophobicity (~60-70), and one 14 at high hydrophobicity (~80-90) ( Figure S6A ). above the scores' thresholds for druggability (0.9, 0.9, 0.5, 0.8 for SiteMap, DScore, LR and CNN 26 models of TRAPP, respectively, Figure S6 . These thresholds were taken from Halgren et al. 44 ). 27 There is not, however, any notable correlation between the druggability scores from the 28 SiteScore/DScore and TRAPP methods. This is expected because the observed variations in the 29 druggability index is within the method prediction uncertainty. Despite the slightly lower 30 druggability indices in SiteScore and TRAPP-LR for the apo X-ray crystal structures compared to 31 the holo crystal structures, they are still predicted to be druggable within the uncertainty of the 32 methods. In contrast, about 50% of the simulated structures (MD, LRIP, and tConcoord) were 33 predicted not to be druggable ( Figure S6 ). 34 The druggability scores of the simulated, and, more, of the X-ray crystal structures correlate with 35 binding site hydrophobicity ( Figure S6 , Section S3.3, and Table S4 , S5). The correlation with other 36 binding site features is much smaller (see Table S4 , S5). We conclude that, as expected, the 37 more hydrophobic the pocket is, the more druggable it is. 38 39 2. Virtual screening. 40 We defined a sample library of a total of 13,535 compounds (Table 1, Figure S7 ). The library 41 included commercialized drugs and compounds under development, the internal chemical library 42 from Dompè pharma company and compounds from the Fraunhofer Institute BROAD 43 Repurposing Library, as well as known inhibitors of SARS-CoV-1 Mpro. In particular the library 44 included a set consisting of 180 compounds with pIC50 against SARS-CoV Mpro greater than 6 1 reported in the literature (active molecules, hereafter). 12, 16, Our sample library is chemically 2 very diverse as compared to the crystallographic ligands in complex with SARS-CoV-2 Mpro and 3 the active molecules (see Figure S7 ). A more detailed chemoinformatics analysis is reported in 4 Section S4.1-2. 5 We selected SARS-CoV-2 Mpro conformations with scores above the druggability thresholds. 6 These include all the X-ray crystal structures except 6WTK (42 structures), 10 MSM ensemble 7 conformations (MSM selection), and 8 representatives of the top 10% scoring conformations from 8 the MD10000 and LRIP/tC ensembles (see Table 1 , Section S4.3, Table S6 , Figure S8 -9). 9 The ligands were screened against the conformations using OpenEye FRED 68 and Schrödinger 10 Suite Glide Version 85012. 69, 70 We discuss here the results obtained with FRED. Those obtained 11 with Glide present similar trends and are reported in the Supporting Information (Section S4.4 12 and Figure S10 ). Also, we report here only the calculation results obtained with the 13 MD10000/LRIP/tC and X-Ray selections. The data obtained with the MSM selection are reported 14 in the SI (Section S4.5, Figure S11 ). 15 The quality of the virtual screenings was evaluated in terms of: (i) Enrichment Factor (EF) defined 16 as EF(1%) = ( #active molecules in the top 1% / #molecules in the top 1% ) / ( active molecules 17 in the whole set). (ii) Receiver Operating Characteristic (ROC) curves, used to evaluate the true 18 positive rate and the area under the curve (AUC). 19 The structures from the MD10000/LRIP/tC and MSM selections, along with the apo X-ray 20 structures, exhibited a poor EF (below 5%), despite being identified as druggable by all the 21 druggability prediction methods here implemented ( Figure 3 ). The poorer performance of the apo 22 X-ray crystal structures was expected since they exhibited overall lower druggability scores with 23 respect to the holo structures ( Figure S6F ). This was not the case with the selected structures 24 from the MD10000 and LRIP/tC ensembles (see Section S4.3), where only the 10% of the 25 structures with highest druggability scores were used for screening. This suggests that the 26 druggability prediction methods are not sensitive enough to distinguish between high-and low-27 EF conformations for the chemical space considered (see also discussion below). On the other 28 hand, for the holo X-ray crystal structures, both "well-performing" (EF > 15%) and "poorly-29 performing" (EF < 5%) conformations were identified ( Figure 3 ). Next, we determined which of these ensembles exhibits a Protein-Ligand Interaction Fingerprint 9 (PLIF) comparable to the one established by the ligands co-crystallized with SARS-Cov2 Mpro. 10 In the PLIF of the latter ( Figure 4A ), we observe an overall predominance of Hbond interactions 11 over hydrophobic ones, with Cys145 (catalytic dyad), Gly143, Ser144, His163 and Glu166 as the 12 most attractive residues to form Hbond interactions (comparable occurrence) with the ligands. 13 The only exception is represented by His41 (catalytic dyad), which is similarly involved in Hbonds 14 and hydrophobic interactions. The PLIF of the well-performing conformations ( Figure 4B , upper panel) matches the one of the 14 crystallographic complexes well ( Figure 4A ). Namely, the same hot-spot (i.e. preferential residues 15 for ligand binding) residues emerge: the ligands form Hbonds with Cys145 (catalytic dyad), 16 Gly143, Ser144, Gly166, and His163, as well as hydrophobic interactions with His41 (catalytic 17 dyad). The main difference between the two PLIFs is in the lower occurrence of Hbonds involving 18 Gly143, Ser144 and Cys145 (catalytic dyad) than in the crystallographic complexes, and in the 19 higher occurrence of hydrophobic contacts with Thr25 and His41 (the second residue in the 20 catalytic dyad). This change in the surrounding of the reactive cysteine, Cys145 (Thr25, Gly143, 21 Ser144) might be due to the presence of several covalent ligands in the crystal structures. 22 Covalent binding might locally alter the PLIF, and this effect is not considered in the virtual 23 screening. 24 In all the other selections (i.e. poorly-performing X-ray structures Figure 4B lower panel, MD10000 1 and LRIP/tC selection Figure 4C , and MSM selection Figure S11 ), the key Hbonds above 2 discussed have an occurrence that is markedly lower than non-specific hydrophobic interactions. 3 Also the latter interactions substantially decrease, including those with His41 (catalytic dyad). 4 Moreover, in the MSM, MD10000 and LRIP/tC selections, ligands interact with almost all residues 5 of the binding site ( Figure 4C and Figure S11 ), but with an occurrence below 25% (for each 6 interaction type) and with a strong predominance of hydrophobic interactions versus HBonds. 7 This points to a rather non-specific binding of the screened molecules in the MD/MSM-selected 8 structures. 9 Summarizing, we found that our evaluation of the virtual screening procedure correctly identifies 10 the conformations able to provide the most similar PLIF to the known crystallized ligands of SARS-11 CoV2 Mpro. 12 To rationalize why the binding site structural determinants cause such dramatic differences in the 13 PLIF of crystal structures (i.e. poorly/well-performing), MSM, MD10000 and LRIP/tC selections, 14 we compared the average binding site shapes for the different selections. We found that the 15 residues in the MSM, MD10000 and LRIP /tC selections are distributed over a larger volume than 16 in the crystal structures ( Figure 4D -F); therefore, the spatial location of the hotspots (i.e. HBond 17 donors/acceptors, hydrophobic patches, charges) is significantly different. MSM, MD10000 and 18 LRIP/tC selections were composed of structures with high druggability scores (see Section S3.3), 19 suggesting that the druggability scores are, in this case, unable to identify the binding site features 20 responsible for good performances (defined here as EF and AUC, see above) in virtual screening. 21 Even re-selecting the conformations from the MD-ensembles using the similarity with respect to 22 well-performing structures as criterion (i.e. Root Mean Square Deviation, RMSD), did not provide 23 a satisfying performance (Section S5, Figure S12 ), indicating that key features for obtaining high 24 enrichment factors in the virtual screening, i.e. the precise placement of interacting residues, are 25 missing. 26 Let us now compare the average binding site shape of the well-versus the poorly-performing 27 crystal structures: two adjacent cavities can be observed on the well-performing structures ( Figure 28 4D): one including Glu166, Pro168, Gln189, Thr190 and Gln192, that is missing in the poorly-29 performing structures ( Figure 4E ); the other including Thr25 Leu27, His41, Asn142, Cys145, 30 Met165, Glu166 and Gln189 that is also present in the poorly-performing structure, but wider and 31 less deep than in well-performing ones. Therefore, interactions with Pro168 only appear in the 32 well-performing structures, and the occurrence of Gln189 is much higher in well-performing than 33 in poorly-performing ones. Notably, the binding site shape of the poorly-performing structures 34 shares strong similarities with that of the apo structures ( Figure 4G ). 35 This difference in binding site shape is a consequence of the fact that in the well-performing 36 structures, the small helix near P2 group (residues 46-50) and the β-hairpin loop near P3-P4 37 substituents (residues 166-170) shift about 2.7 Å apart with respect to the poorly-performing 38 ones, whereas the P5 loop (residues 190-194) moves closer to the P3-P4 loop. Also, two 39 methionines, Met49 and Met165, change their side-chain orientation impacting the side chain 40 positions of the overall P2 loop, as well as the exposure of His163 ( Figure 4H ); the interaction 41 with the latter significantly decreases in the poorly-performing structures, as already discussed 42 above. These differences are similar to those observed between the well-performing and the apo 43 structures ( Figure S13 ). 44 Taken together, these results may explain why the druggability indices are unable to distinguish 1 the binding site features linked to a good performance in virtual screening: very subtle variations 2 of the conformations of the binding site residues induced upon binding, and therefore not present 3 in the structural ensembles generated in the absence of a ligand, can lead to significant 4 differences in the EF. These subtle variations are very challenging to discriminate in terms of 5 druggability indices. 6 7 3. The active-pharmacophore and the SARS-CoV-2 blueprint on the chemical space. 8 To identify the relevant ligand features that might relate to high affinity binding, we extract here 9 the consensus chemical space, defined by the common ligands across the top 1% of the well-10 performing structures in the virtual screening. These are 32 molecules, 16 of which belong to the 11 "active" molecules in our library ("consensus active" hereafter, Figure S14 and Table 1 ). We next 12 calculated the corresponding pharmacophore (i.e. an ensemble of steric and electronic features 13 that is necessary to ensure the optimal supramolecular interactions with a specific biologic target) 14 and linearly combined it with the X-ray pharmacophore. This is done to consider possible 15 additional features coming from covalent binding that cannot be covered by the virtual screening 16 protocol. The consensus pharmacophore combined with the X-ray pharmacophore constitutes 17 the "active-pharmacophore", hereafter (see Methods, Figure 5) . 18 Next, we tested the predictive power of our active-pharmacophore in discriminating the higher 19 affinity binders across all the so-far known SARS-CoV-2 Mpro inhibitors: these are 46 molecules 20 coming from the papers published until November 20 th 2020, which were not included in our 21 sample library and that display a measured affinity spanning from 30 nM to 125 µM 17, 31, [71] [72] [73] [74] [75] [76] [77] ). 22 Some of these molecules were excluded by the docking software due to their excessive size or 23 due to the presence of metals (e.g. candesartan cilexetil, Evans blue, phenylmercuric acetate). 24 For this purpose, we calculated the Dice coefficient, which measures the number of features in 25 common between the molecule and the active-pharmacophore, relative to the average number of 26 features present. 78 When scoring the 46 known SARS-CoV-2 Mpro binders according to the Dice 27 coefficient, the highest scored molecules were 11a, 11b, 11r, UAWJ246, UAWJ247, UAWJ248 28 and CG373 (see Figure S15 ), which are also the highest affinity (IC50 ≤ 400 nM) SARS-CoV-2 29 Mpro binders (nM-binders, hereafter). On the other hand, it is not possible to discriminate the sub-30 µM-binders of SARS-CoV-2 Mpro (400 nM < IC50 ≤ 1000 nM) from the µM ones (IC50 > 1000 31 nM) (see Figure S16 ). Of course, these results have to be taken with care given the fact we are 32 comparing assay-dependent IC50 values coming from different labs. Also, several of these 33 inhibitors are predicted to be covalent binders and, therefore, their IC50 values can be 34 controversial (see the discussion offered in the Limitation paragraph). Therefore, in the next 35 section, we analyzed the chemical space shaped by the well-performing conformations upon 36 ligand binding, and offer a rationale for the predictive power of our active-pharmacophore in 37 identifying nM-binders of SARS-CoV-2 Mpro. 38 transferability. The PLIF of the consensus chemical space is dominated by the "consensus 41 active" ones (the latter PLIF is almost identical to the former one, Figure S17A ) and it shows a 42 predominance of HBond interactions with His163, Glu166, Gly189 and Cys145 ( Figure 5A ), as 43 well as hydrophobic interactions with Thr25, His41, Met165, Pro168 and Gln189. Accordingly, the 1 same trends can be seen for the PLIF of the SARS-CoV-2 Mpro nM-binders ( Figure 5B ), the only 2 differences being (i) the additional high occurrence of Gly143 and Ser144 Hbonds (as already 3 observed in the PLIF of the X-Ray structures), and (ii) the lower occurrence of hydrophobic 4 interactions with Thr25. 5 When comparing the binding poses of the "consensus active" molecules and the nM-binders of 15 SARS-CoV-2 Mpro (Section S6), we found indeed that: the indole group (in 11a, 11b and 7 of the 16 16 molecules of the "consensus active" set) or the benzyl group (in GC373, 11r and 6 of the 16 17 molecules of the "consensus active" set), or the benzimidazole group (in 1 of the 16 of the 18 "consensus active" set) is buried in the upper sub-cavity defined by residues Glu166, Pro168, 19 Gln182, Gln189 and Thr190 ( Figure 5E-F) . Notably, this cavity was shrunk in the poorly-20 performing structures, further validating the quality of our model that correctly excluded the 21 conformations potentially incompatible with nM-binders. The benzothiazole moiety of the 22 "consensus active" is instead located in the lower part of the binding cavity defined by Thr24, 23 Thr25 and Thr26. This benzothiazole moiety is absent in the SARS-CoV-2 Mpro nM-binders, 24 possibly explaining the lower occurrence of hydrophobic interactions with Thr25. 25 This suggests that the binding to the lower part of the binding site (Thr24, Thr25, Thr26) is not a 1 relevant feature for the nM affinity of SARS-CoV-2 ligands. In contrast, the high occurrence of 2 Gly143 and Ser144 Hbonds appears to be a signature of nM-binders of SARS-CoV-2 Mpro, also 3 found in the PLIF of the known X-ray ligands of SARS CoV-2 complexes. Notably, the formation 4 of these two Hbonds appear to be significantly hampered in the "consensus active" set due to the 5 presence of the above-mentioned benzothiazole moiety, that seems to compromise the 6 juxtaposition of the Hbond acceptors of the ligands. Accordingly, none of the SARS-CoV-2 nM-7 binders, display benzothiazole or analogous bulky aromatic groups in such a position ( Figure 8 S14). 9 Analysis of all the 166 holo SARS-CoV-2 Mpro crystal structures also showed that the majority 10 of their ligands do not have a benzothiazole or analogous bulky aromatic groups in the Thr24, 11 Thr25, Thr26 subpocket ( Figure S18 ). The two exceptions are PDB IDs 7JKV and 6XR3, where 12 the ligands are covalently bound and the bulky group appear twisted in the cavity in a different 13 position with respect the predicted the other X-rays and docking poses: this is possibly due to 14 ligand-dependent conformational rearrangement upon covalent bond formation with Cys145. 15 Concerning the µM binders known so-far, only very few of them are predicted to have a bulky 16 group in such a position ( Figure S16 ). In other words, our results suggest that the bottom part of 17 the binding cavity in SARS-CoV-2 Mpro should only host small aromatic/hydrophobic moieties (or 18 nothing at all) to facilitate the formation of Gly143 and Ser144 Hbonds, the latter being a signature 19 of the currently known nM-binders to SARS-CoV-2 Mpro. 20 Recently, the X-ray structure of GC373 in complex with SARS-CoV-2 Mpro was solved (PDB ID: 21 7BRR, recently superseded by PDB ID 7D1M (released on October 28 th 2020). The ligand in the 22 crystal structure appears in two different conformations, one resembling the predicted pose from 23 us, where the benzyl group of GC373 is not buried in the upper sub-cavity defined by residues 24 Glu166, Pro168, Gln182, Gln189 and Thr190; The other where this benzyl group it is exposed 25 toward the solvent. Yet, when the crystallographic complex undergoes 500 ns of MD simulations, 26 the pose where the aromatic ring is exposed toward the solvent rearranges as in the predicted 27 docking pose (see Section S7 and Figure S19 ). Such results further validate our active-28 pharmacophore. 29 We next calculated the PLIFs of both the sub-µM ( Figure S17B ) and µM inhibitors ( Figure 5C ) of 30 SARS-CoV-2 Mpro, which we were unable to discriminate with our active-pharmacophore. We 31 found that indeed the two PLIFs are very similar and they both feature a drastic decrease of the 32 occurrence of all the key HBonds found by the nM-binders of SARS-CoV-2 Mpro. The sub-µM 33 and µM PLIFs turn out to resemble the PLIF of the "active" molecules that were not in our 34 consensus group (i.e. nM-binders for SARS-CoV Mpro predicted to be low affinity binders for 35 SARS-CoV-2 Mpro, Figure S20A , Section S8). Indeed, the "active" molecules excluded by our 36 consensus present differences that may hamper a positive scoring and, possibly, a successful 37 binding to the receptor, like an excessively long peptidic chain (i.e. Figure S20 , molecules a and 38 d), bulky substituents (i.e. Figure S20 , molecules a and b) or a five membered ring rarely observed 39 in Mpro inhibitors (i.e. Figure 20 , molecule c). 40 This suggests that our active-pharmacophore is not only able to discriminate the nM-binders of 41 SARS-CoV-2 Mpro (IC50 ≤ 400 nM) from the rest, but it also identifies key specific transferable 42 and not-transferable binding features of nM SARS-CoV Mpro binders to SARS-CoV-2 Mpro ones. 43 Taken together, these results suggest that our active-pharmacophore is a fair representation of 44 the SARS-CoV-2 Mpro blueprint in the chemical space. Namely, it correctly represents a set of 45 binding features compatible with the induced SARS-CoV-2 conformational space of the binding 1 site. The latter is in part determined by the ligand upon binding and in part it depends on the 2 residues differing in SARS-CoV-2 Mpro with respect to SARS-CoV Mpro, as also shown in 3 Bzówka et al. 18 4 5 We considered a set of publicly available compounds within the E4C network, 79 from the EU-7 OPENSCREEN Bioactive CompoundLibrary, 80 coming from the PROBE MINER repository. 79, 81 8 The set was re-scored based on the Dice similarity of their docked pose to our active-9 pharmacophore (see Methods). Benserazide (EOS100736) and Myricetin (EOS100814) 10 compounds (see schemes in Figure 6A , B) were predicted as nM-binders of SARS-CoV-2 Mpro 11 candidates. SARS-CoV-2 Mpro biochemical assays performed here established the accuracy of 12 our predictions by measuring IC50 values as low as 140 nM and 220 nM, respectively (see 13 Section S9, Figure S21 ). 14 The docking pose of Myricetin, as coming out from our virtual screening procedure, shows an 15 orientation which is comparable to the one observed in the newly solved X-ray structure with PDB 16 ID 7B3E (resolution 1.77 Å, see Figure 6C ). In this pose, the bicyclic ring in the two structures 17 nicely overlap, while the 3,4,5-trihydroxyphenyl moiety is rotated in our predicted pose with 18 respect to the crystallographic one. By refining the docking pose (see Section S9.1 and Figure 19 6C) both the bicyclic ring and the 3,4,5-trihydroxyphenyl moiety assumes an orientation nearly 20 identical to the one found in the X-ray pose after covalent binding with Cys145, with an overall 21 RMSD of 0.46 Å between the refined predicted pose and the crystallographic one. Interestingly, 22 Baicalin features the same isoflavon scaffold as Myricetin, yet it binds the protein with a different 23 orientation, as shown by X-ray studies (PDB ID 6M2N). Our procedure predicted such orientation, 24 although the overall binding pose differed more significantly from the X-ray one than that with of 25 Myriecitin (see Section S9.2, Figure S22 ). 26 Myricetin and Benserazide contain polyhydroxy-phenolic moieties, which are considered 27 promiscuous due to their redox features but also to the presence of a high number of close Hbond 28 acceptor/donor sites that allow them to satisfy several 3D-pharmacophores. Nonetheless, these 29 compounds have, respectively, reached approved clinical usage (i.e. for Parkinson's disease 82 30 and alcohol use disorder 83 ) and are in use in our diet like other polyhydroxyphenol-containing 31 products. 84 Also, Quercetin, structurally similar to Myricetin, was identified mild inhibitor of SARS-32 CoV-2 Mpro (KI ~ 7 μM). 85 Mpro inhibitors in the nM range (IC50 ≤ 400 nM). This contrasts with SARS-CoV Mpro, for which 20 127 nM inhibitors are known in this range. 46, 47, 52, 54, 56, [60] [61] [62] 65 The observed difficulties in identifying 21 potent SARS-CoV-2 Mpro inhibitors was suggested to arise from the large plasticity of the binding 22 site, 18 along with other factors (also observed for SARS-CoV Mpro), including induced-fit 23 conformational changes and formation of covalent bonds upon ligand binding 30 . Therefore, the 24 available binding space can differ significantly from ligand to ligand. 25 Accounting for receptor binding site flexibility in molecular docking is a significant challenge. This 26 can be partially overcome with a careful choice of the most appropriate receptor and reference 27 ligand(s) or by performing ensemble docking approaches. 86, 87 While it seems logical to employ 28 multiple protein structures and ligands where available, very few published studies have 29 systematically evaluated the impact of using additional information on proteins and ligands' 30 structure. 88 These studies arrive at the conclusion that alternative structure-based design 31 approach may be to define pharmacophores based on the binding site and use them to search 32 large chemical databases. 89, 90 33 Our paper exploits the particularly large amount of structural information available for Mpro (~200 34 X-ray structures in the apo and in the holo forms), along with a very long MD simulation from the 35 D. E. Shaw group 37 and structures generated here by enhanced sampling of the binding site 36 dynamics. First, we have determined the potential druggability of each of the ~30,000 Mpro 1 conformations generated from these sources, as calculated using TRAPP and SiteMap 2 druggability tools. 38, 42, 43 Next, we have used a sample library to understand how selected 3 potential high-druggable protein conformations perform when probed with a diverse chemical 4 space, here defined by 13,000 compounds (see t-SNE plot in Figure 7 , and methods for details). 5 6 Our library included also "active" molecules, i.e. molecules that are known to bind with nM affinity 7 (pIC50 > 6) to SARS-CoV Mpro. Therefore, virtual screenings against ~200 protein conformations 8 were performed. We found that only a few of these highly-druggable ("well-performing") 9 conformations recognize a sufficiently high percentage of such "active" molecules: The ones in 10 common among across the well-performing structure, "consensus active" molecules (16 in total) 11 represent a small subgroup of the overall "active" molecules' ensemble with specific features. In 12 particular, the "consensus active" (16 molecules) are mostly clustered in two main areas of the t-13 SNE plot, both corresponding to peptidomimetic structures but differing from each other by the 14 presence of a benzothiazole moiety and an additional peptide bond. The specific protein-ligand 15 interaction fingerprint (PLIF) of the "consensus active" molecules strikingly resembles the one 16 emerging from the SARS-CoV-2 Mpro co-crystalized ligands. The latter indeed cluster in the same 17 region of the t-SNE plot. Notably, no consensus was found for the poorly-performing structures. 18 We then combined the crystallographic and the virtual screening PLIFs (i.e. the chemical space 19 emerging from experimental structures and the chemical space selected upon virtual screening). 20 Within the limitations of the procedure (discussed in the Limitation paragraph), we obtained an 21 "active-pharmacophore" that we first used against a selection of SARS-CoV-2 Mpro binders (46 22 molecules): the latter are very diverse and they are spread overall the t-SNE plot (Figure 7) . The 23 active-pharmacophore could predict known nM-binders for SARS-CoV-2 Mpro (12 molecules out 24 of the total of 46), which are also clustered in the peptides and peptidomimetics region of the t-25 SNE plot, and discriminate these from the µM ones. Moreover, it could also discriminate the 26 transferable from the non-transferable binding features from SARS-CoV to SARS-CoV-2 Mpro. 27 The former include the interaction with the catalytic dyad residues along with: (i) His163, whose 28 mutation to Ala inactivates SARS-CoV Mpro, 91 (ii) Glu166, which plays a role in the dimerization 29 (required for enzymatic activity) in In addition, its interactions with the N-finger of 30 the other subunit assist the correct orientation of residues in the binding pocket for both proteins, 92, 31 93 (iii) Gln189, which correlates evolutionally with residues from the Cys44-Pro52 loop in both 32 proteins, which was shown to regulate ligand entrance to the binding site 18 in both proteins, and 33 (iv) Ser144, whose mutation to Ala hampers the catalytic activity in SARS-Cov MPro. 94 34 The non-transferable binding features include the ability to place large hydrophobic/aromatic 35 groups in the part of the cavity defined by Thr25 and Thr26 that is partially lost in SARS-CoV-2 36 Mpro compared to SARS-CoV Mpro. This appears to affect HBonds with Gly143 and Ser144. 37 Accordingly, this cavity is empty or occupied by a smaller aromatic group like a benzyl ring in all 38 the known nM-binders and the co-crystallized ligands of SARS-CoV-2 Mpro. In contrast, several 39 of the known nM-binders of SARS-CoV Mpro have benzothiazole or analogous bulky aromatic 40 groups in this position. 41 We finally used our active-pharmacophore against a public library of compounds. We predicted 42 two ligands to be nM for SARS-CoV-2: Benserazide and Myricetin. Biochemical assay 43 experiments confirmed them to be nM binders of SARS-CoV-2 ( Figure S21 ). Our predicted pose 44 of Myricetin is in very good agreement with the just solved X-structure of the complex ( Figure 6C ). 1 This was not expected since Baicalin baring the same isoflavon scaffold of Myricitin binds in a 2 reversed position (PDB ID 6M2N). Thus, the pharmacophore not only successfully predicts poses 3 in a highly flexible binding site as that of SARS-CoV-2 Mpro, but it also discriminates between 4 different orientations of quite similar scaffolds. the co-crystalized ligands are shown in red, with selected ligands shown in 2D representation on the side (labeled e-g). The 1 inset with a magnified portion of the t-SNE plot is reported at the bottom of the figure. The "active" molecules appear to be 2 more chemically diverse than the SARS CoV-2 Mpro co-crystallized ligands since they are spread over all the t-SNE plot, 3 while the co-crystallized ligands mostly cluster in the bottom part of the plot. This region corresponds to peptides covalently 4 bound to SARS-CoV-2 Mpro-C145 (PDB ids: 6LU7, 6LZE, 6M0K, 6WTJ, 6WTK, 6WTT, 6XA4, 6XBH, 6XBG, 6XBI, 6Y2F, 5 6Y2G, 6YZ6, 6Z2E, 7BQY, 7BRR, 7C8R, 7C8T, and 7C8U, see Table S1 ). 6 Our methodological approach also demonstrates that only a small fraction of the binding sites of 7 the apo protein from crystal structures or simulations are similar to those of the well-performing 8 holo structures. However, even the conformations with high structural similarity and high 9 druggability scores, generated by molecular dynamics simulations, yielded low enrichment factors 10 in virtual screening. Thus, druggability assessment methods fail to discriminate between small 11 structural variations of the binding site that lead to successful performance in virtual screening. 12 These small structural differences significantly impact ligand binding predictions as observed in 13 our virtual screening campaigns. They could also be a source of disappointing results in other 14 virtual screening campaigns carried out so far by research groups world-wide. 22, 28, 32, 33, 95 These 15 observations indicate that there is space to improve the discriminatory ability of druggability 16 scores by training on a wider range of structures generated by simulation as well as 17 crystallography. Moreover, they highlight the need to develop simulation methods to generate 18 holo-like protein structures for virtual screening. 19 This work was carried out within the framework of EXSCALATE4CORONAVIRUS (E4C) 79 project. 20 E4C aims to exploit the most powerful computing resources currently based in Europe to 21 empower smart in silico drug design applied to the 2019-2020 SARS-CoV-2 coronavirus 22 pandemic, while increasing the accuracy and predictability of Computer-Aided Drug Design 23 (CADD). We here used 400,000 core-hours on the JURECA supercomputer in the Jülich 24 Supercomputing Centre. Advanced CADD in combination with high throughput biochemical and 25 phenotypic screening are allowing the rapid evaluation of simulation results and the reduction of 26 time for the discovery of new drugs. 96 The work presented here indeed shows how a 27 computational procedure combined with experimental validation can correctly predict structure 28 and affinity trends of effective hit molecules for a given target. This kind of combined approaches 29 may be a key strategy especially against pandemic viruses and other pathogens, where the 30 immediate identification of effective treatments is of paramount importance. 31 Expression and enzymatic activity. SARS-CoV-2 Mpro (ORF1ab polyprotein residues 3264-34 3569, GenBank code: MN908947.3) has been produced, purified and used as described in Zhang 35 et al. 17 The detection of enzymatic activity of the SARS-COV-2 3CL-Pro was performed under 36 conditions similar to those reported by Zhang et al. 17 Enzymatic activity was measured by a 37 Förster resonance energy transfer (FRET), using the dual-labelled substrate, DABCYL-38 KTSAVLQ↓SGFRKM-EDANS (Bachem #4045664) containing a protease specific cleavage site 39 after the Gln. In the intact peptide, EDANS fluorescence is quenched by the DABCYL group. 40 Following enzymatic cleavage, generation of the fluorescent product was monitored (Ex 340 nm, 41 Em 460 nm), (EnVision, Perkin Elmer). The assay buffer contained 20 mM Tris (pH 7.3), 100 mM 42 NaCl and 1 mM EDTA. The assay was established in an automated screening format (384 well 43 black microplates, Corning, #3820) 44 45 Primary screen and dose response. In the primary screen, test compounds (stock at 10 mM in 1 100 % DMSO), positive (Zinc-Pyrithione [medchemexpress, Cat. No.: HY-B0572] 10 mM in 100 2 % DMSO) and negative (100 % DMSO) controls, were transferred to 384-well assay microplates 3 by acoustic dispensing (Echo, Labcyte). Plate locations were: test compounds at 20µM final 4 (columns 1 to 22); positive control Zinc-Pyrithione at 10 µM final (column 23); and negative control 5 0.2 % v/v (column 24). 5 µl of SARS-CoV-2 Mpro stock (120 nM) in assay buffer were added to 6 each plate well and incubated with the compounds for 60 min at 37 °C. After addition of 5 µl 7 substrate (30 µM in assay buffer), the final concentrations were: 15 µM substrate, 60 nM SARS-8 CoV-2 Mpro, 20 µM compound, and 0.2% DMSO, in a total volume of 10 µL/well. The 9 fluorescence signal was then measured at 15 min and inhibition (%) calculated relative to controls. 10 To flag possible optical interference effects fluorescence was also measured 60 min after 11 substrate addition, when the enzymatic reaction was complete. Results were normalized to the 12 100 % and 0 % inhibition controls. Subsequently, the binding pocket's druggability in each of the generated protein conformations 44 was computed (using LR and CNN methods, 41 see below for more details). Additionally, we 1 computed the druggability for 10,000 snapshots taken at 10 ns intervals from the 100 µs standard 2 MD trajectory generated by starting from the crystal structure with PDB ID 6Y84 by D. E. Shaw 3 Research. 37 4 5 Generation of binding pocket conformational ensembles using LRIP and tConcoord. We 6 based the conformational ensemble generation on the high-resolution X-ray crystal structures 7 with PDB IDs 6LU7 (2.16 Å resolution 103 ) and 6Y2G (2.20 Å resolution 17 ). All 306 residues of 8 SARS-CoV-2 Mpro are resolved in PDB ID 6LU7; for PDB ID 6Y2G, the unresolved C-terminal 9 residues (see Table S1 ) were modeled using PyMol. 104 All TRAPP analysis was conducted on 10 homodimeric structures, generated with the symmetry wizard of PyMol. 104 All heteroatom records 11 were removed from the protein structures and hydrogen atoms were added at a pH of 7.4 using 12 the CHARMM force field 105 by assigning a distance of 3.5 Å around all atoms of the ligand N3 from PDB ID 6LU7. This 24 distance was used to detect residues that potentially may contribute to the binding site and to 25 define dimensions of a 3D grid that was then used to compute the binding pocket shape. Then 26 the binding pocket for each structure was mapped on the 3D grid. The druggability score of this 27 pocket was computed using linear regression (LR) or a convolutional neural network (CNN) and 28 scaled between 0 and 1. 41 Scores were calculated for all conformations generated for 6LU7 and 29 6Y2G from LRIP and tConcoord, as well as for frames collected every 10 ns from the conventional 30 MD simulations. 37 For each structure, a set of the binding site residues that line the binding pocket 31 was detected using the procedure implemented in the TRAPP package. Specifically, each residue 32 was characterized by the number of atoms that contact with the binding pocket. 33 Structure Selection: First we selected all structures (generated by tConcoord, LRIP, or extracted 34 from MD trajectories) with the top 10% of the LR and CNN druggability score. These structures 35 were then clustered by the binding site similarity into 8 clusters using k-means procedure (see 36 Figure S8 and also Section S4.3 for more details of about 13,500 drugs were obtained. All compounds 46, 47, 52, 54, 56, [60] [61] [62] 65 were converted from 2D 3 to 3D and prepared with Schrödinger's LigPrep tool. 107 This process generates multiple sets of 4 coordinates for different stereoisomers, tautomers, ring conformations (1 stable ring conformer by 5 default) and protonation states. The Schrödinger Epik software 108, 109 was used to assign 6 tautomers and protonation states that would be more populated at the selected pH range (pH = 7 7 ± 1). Ambiguous chiral centers were enumerated, allowing a maximum of 32 isomers to be 8 produced for each input compound. OPLS3 parameters were generated for each ligand. Multiple 9 conformations for each compound were generated and a 1,000 step torsional sampling was 10 performed. The conformers were retained if the minimized energy of the conformer is within 50 11 kJ/mol of the global minimum. 110 After preparation, this resulted in 23,000 coordinate files for 12 different conformers, tautomers and protomers of the compounds in the library. 13 14 Glide Docking. The protein was preprocessed with the Protein Preparation Wizard from the 15 Schrödinger Suite version 2019-4 with the default parameters. 108, 109, 111, 112 The protonation states 16 of each side chain were generated using Epik for pH = 7 ± 2. 108, 109 All water molecules were 17 removed. Energy minimization was performed using the OPLS3 force field. 110, 113 Glide Version 18 85012 69, 70 was used for all docking calculations. Docking of the compounds, prepared as 19 described above, was performed to both active sites of the homodimer. The grids for the docking 20 were prepared using the default parameters, with the internal grid box centered on the centroid 21 of the co-crystallized ligand, when available. The external grid box was defined by checking the 22 option "Dock ligands similar in size" (~32 x ~32 x ~32 Å). The internal grid box guides the docking 23 algorithm to the region of interest, while the external grid box allows greater flexibility in ligand 24 size and orientation. The Glu166 (backbone nitrogen bound hydrogen atom as well as the 25 backbone oxygen atom), His163 (sidechain NE2 bound hydrogen atom) and His164 (sidechain 26 oxygen atom) were defined as possible Hbond constraints. A standard precision (SP) Glide 27 docking was carried out, generating 20 poses per docked molecule; During the docking 28 procedure, poses were rejected, if they were not able to fulfill at least 2 of the Hbond constraints 29 defined above. Glide score version 5.0 114 was used to rank the binding poses. This is an empirical 30 scoring function designed to reproduce trends in the binding affinity. 31 32 FRED Docking. The SARS-CoV-2 Mpro receptors were generated by OpenEye 33 Spruce4Docking, 68, 115, 116 using as input the structures pre-processed by the Protein Preparation 34 Wizard and Epik in the Schrödinger Suite. The protonation state of the receptor was not altered. 35 The OpenEye Docking suite performs rigid docking of pre-generated conformers. Here, these 36 conformers were generated from the prepared library (see above) using OpenEye OMEGA. 117 37 Conformers with internal clashes and duplicates were discarded by the software and the 38 remaining ones were clustered on the basis of the root mean square deviation (RMSD In its exhaustive search, FRED translates and rotates the structure of each conformer within the 2 negative image of the active site, scoring each pose. The first step has a default translational and 3 rotational resolution of 1.0 and 1.5 Å, respectively. The 100 best scoring poses were then 4 optimized with translational and rotational single steps of 0.5 and 0.75 Å, respectively, exploring 5 all the 729 (six degrees of freedom with three positions = 36) nearby poses. The best scoring 6 pose was retained and assigned to the compound. The binding poses were evaluated by using 7 the Chemgauss4 scoring function implemented in OpenEye. 68, 115, 116 8 9 Molecular Fingerprint generation. Chemical circular fingerprints used for the t-SNE plots in this 10 work were generated using the RDKIT package in an in-house python script. 118 aromatic edge-to-edge, Hbond acceptors and donors, and ionic interactions) with the ligand. The 20 active-pharmacophore was derived by the average of the fingerprint of the "active" molecules in 21 the consensus, docked on the well-performing structures, with the fingerprint of the co-crystallized 22 ligands in the X-ray structures considered in this work. The resulting vector components were 23 then rounded according to a cutoff in order to form a bit vector representation. The most suitable 24 cutoff value was derived by iterating cutoff values between 0.1 and 0.9 and calculating the Dice 25 similarity index between the resulting bit vector and the PLIF generated by the docking poses of 26 the known SARS-CoV-2 Mpro binders. The cutoff, which would result in the highest dice similarity 27 when compared to the nanomolar affinity ligands was chosen. The molecules used for generation 28 of the crystal structure consensus fingerprint slightly overlap (7 out of 36 crystal ligand 29 conformations) with the sub-µM affinity ligands that are scored with the dice coefficient. When 30 removing these from the query, the pharmacophore is still able to locate the higher affinity ligands 31 in the top positions. The same Dice was screened against a selected number of EU-32 OPENSCREEN ERIC Bioactive Compound Library. The PLIF representation in the form of 33 stacked histograms used in this work was generated using python matplotlib library (v 3.3.1). 122 34 In the histograms, each interaction is represented by a bar, whose height is proportional to the 35 number of times the interaction was observed among the group of molecules Limitations. As with any modelling study, our models also have limitations. Protein mobility is 2 most probably increased by the presence of moving and displaceable water molecule 123 . In the 3 molecular screenings, individual water molecules, which may be critical in the binding process, 4 were not accounted for. Solvation effects can account for up to 100-fold difference in binding 5 affinity (corresponding to ~3 kcal/mol in binding free energy 124 ). Also, we in part rely on scoring 6 functions to rank and select the best binding poses. Current docking/scoring methods 124, 125 were 7 suggested to provide reasonable predictions of ligand binding modes, but their performance is 8 often disappointing in predicting ligand binding affinities. Additionally, those methods are often 9 system-dependent, making it very hard to decide which scoring function is suitable for the chosen 10 target protein. 945539 (Human Brain Project SGA3). G.R. A.Z., P.S. and P.C. acknowledge the E4C consortium. 8 We would also like to thank Dr. Katja Herzog (EU-OPENSCREEN ERIC) for indicating access to 9 the EU-OPENSCREEN ERIC Bioactive Compound Library data 80 . Emerging coronaviruses: Genome structure, replication, and 2 pathogenesis Structures of Two Coronavirus Main 5 Proteases: Implications for Substrate Binding and Antiviral Drug Design Coronavirus Main Proteinase 9 (3CLpro) Structure: Basis for Design of Anti-SARS Drugs Insights into 13 SARS-CoV-2 genome, structure, evolution, pathogenesis and therapies: Structural 14 genomics approach SARS-CoV infection in a restaurant 18 from palm civet Nonhuman primate models for SARS Protease inhibitors as antiviral agents Without Its N-Finger, the Main 31 Protease of Severe Acute Respiratory Syndrome Coronavirus Can Form a Novel Dimer 32 through Its C-Terminal Domain Evidence for 35 substrate binding-induced zwitterion formation in the catalytic Cys-His dyad of the SARS-36 CoV main protease Structure-Based Design, Synthesis, 40 and Activity Assessment Crystal structure of 43 SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide 44 inhibitors Structural and 47 Evolutionary Analysis Indicate That the SARS-CoV-2 Mpro Is a Challenging Target for 48 Small-Molecule Inhibitor Design Discovery 1 of potential multi-target-directed ligands by targeting host-specific SARS-CoV-2 2 structurally conserved main protease Rapid Identification of Potential 6 Inhibitors of SARS-CoV-2 Main Protease by Deep Docking of 1.3 Billion Compounds Inhibitors for Novel Coronavirus 10 Protease Identified by Virtual Screening of 687 Million Compounds Putative Inhibitors 13 of SARS-CoV-2 Main Protease from A Library of Marine Natural Products: A Virtual 14 Screening and Molecular Modeling Study Identification of Potent COVID-19 Main 17 Protease (Mpro) Inhibitors from Natural Polyphenols: An in Silico Strategy Unveils a Hope 18 against CORONA. Preprints.org Nelfinavir was predicted to be a potential 21 inhibitor of 2019-nCov main protease by an integrative approach combining homology 22 modelling, molecular docking and binding free energy calculation Potential inhibitors against 2019-nCoV coronavirus M protease from 25 clinically approved medicines Therapeutic Drugs Targeting Potentially highly potent drugs for Molecular Docking and Virtual Screening based prediction of drugs for COVID-34 19 Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like 37 protease (3CL (pro)) structure: virtual screening reveals velpatasvir, ledipasvir, and other 38 drug repurposing candidates Structural 41 plasticity of SARS-CoV-2 3CL Mpro active site cavity revealed by room temperature X-ray 42 crystallography Feline coronavirus 45 drug inhibits the main protease of SARS-CoV-2 and blocks virus replication Computational 49 Studies of SARS-CoV-2 3CLpro: Insights from MD Simulations inhibitors: Structure-activity relationship study Discovery of 4 N-(benzo[1,2,3]triazol-1-yl)-N-(benzyl)acetamido)phenyl) carboxamides as severe acute 5 respiratory syndrome coronavirus (SARS-CoV) 3CLpro inhibitors: Identification of ML300 6 and noncovalent nanomolar inhibitors with an induced-fit binding Synthesis, 10 and Evaluation of Inhibitors for Severe Acute Respiratory Syndrome 3C-Like Protease 11 Based on Phthalhydrazide Ketones or Heteroaromatic Esters Structure-guided design of potent and permeable inhibitors of MERS 16 coronavirus 3CL protease that utilize a piperidine moiety as a novel design element Design, synthesis, 20 and bioevaluation of viral 3C and 3C-like protease inhibitors Serafini 24 MR. SARS, MERS and SARS-CoV-2 (COVID-19) treatment: a patent review Design, synthesis, 28 and evaluation of trifluoromethyl ketones as inhibitors of SARS-CoV 3CL protease Inhibition of the severe 32 acute respiratory syndrome 3CL protease by peptidomimetic α,β-unsaturated esters N-arylamido)-37 2-(pyridin-3-yl) Acetamides (ML188) as Potent Noncovalent Small Molecule Inhibitors of 38 the Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) 3CL Pr Synthesis and 42 Evaluation of Keto-Glutamine Analogues as Potent Inhibitors of Severe Acute Respiratory 43 Syndrome 3CLpro Identification and evaluation of 46 potent Middle East respiratory syndrome coronavirus (MERS-CoV) 3CLPro inhibitors Prevention and treatment of viral respiratory infections by traditional 50 Chinese herbs Profiling of substrate-specificity and 2 rational design of broadspectrum peptidomimetic inhibitors for main proteases of 3 coronaviruses An overview of severe 6 acute respiratory syndrome-coronavirus (SARS-CoV) 3CL protease inhibitors: 7 peptidomimetics and small molecule chemotherapy New developments 11 for the design, synthesis and biological evaluation of potent SARS-CoV 3CLpro inhibitors Identification, synthesis and 15 evaluation of SARS-CoV and MERS-CoV 3C-like protease inhibitors. Bioorganic & 16 medicinal chemistry Synthesis, docking studies, and evaluation 19 of pyrimidines as inhibitors of SARS-CoV 3CL protease Specific plant terpenoids 23 and lignoids possess potent antiviral activities against severe acute respiratory syndrome 24 coronavirus Discovery of novel 27 inhibitors for human intestinal maltase: virtual screening in a WISDOM environment and 28 in vitro evaluation Macrocyclic inhibitors of 3C and 3C-like proteases of picornavirus, norovirus, and 32 coronavirus Biochemical and pharmacological 35 characterization of isatin and its derivatives: from structure to activity Fused-39 ring structure of decahydroisoquinolin as a novel scaffold for SARS 3CL protease 40 inhibitors FRED Pose Prediction and Virtual Screening Accuracy Glide: a new 46 approach for rapid, accurate docking and scoring. 1. Method and assessment of docking 47 accuracy Glide: a new 1 approach for rapid, accurate docking and scoring. 2. Enrichment factors in database 2 screening Structure of Mpro from SARS-CoV-2 and 5 discovery of its inhibitors GC-376, and 8 calpain inhibitors II, XII inhibit SARS-CoV-2 viral replication by targeting the viral main 9 protease Structure-based design of antiviral 12 drug candidates targeting the SARS-CoV-2 main protease Research progress on repositioning 16 drugs and specific therapeutic drugs for SARS-CoV-2 Identify potent SARS-CoV-2 main 20 protease inhibitors via accelerated free energy perturbation-based virtual screening of 21 existing drugs Structure and inhibition 24 of the SARS-CoV-2 main protease reveals strategy for developing dual inhibitors against 25 Mpro Biochemical screening for SARS-28 CoV-2 main protease inhibitors Measures of the Amount of Ecologic Association Between Species Combined 42 use of benserazide and carbidopa in Parkinson's disease Alcohol use disorders and current pharmacological therapies: the role 46 of GABA A receptors Polyphenols in human health and disease Structural stability of SARS-CoV-2 3CLpro and identification of quercetin as an 2 inhibitor by experimental screening Ensemble 5 Docking in Drug Discovery Ensemble Docking in Drug 8 Discovery: How Many Protein Configurations from Molecular Dynamics Simulations are 9 Needed To Reproduce Known Ligand Binding? New Trends in Virtual Screening Structure-and Ligand-Based Virtual Screening on DUD-E+: 16 Performance Dependence on Approximations to the Binding Pocket AutoPH4: An Automated Method for 20 pH-dependent 24 conformational flexibility of the SARS-CoV main proteinase (M(pro)) dimer: molecular 25 dynamics simulations and multiple X-ray structure analyses Mutation of Glu-166 blocks the substrate-induced 29 dimerization of SARS coronavirus main protease Targeting the Dimerization of the Main Protease of Coronaviruses: A 32 Potential Broad-Spectrum Therapeutic Strategy Identification of novel 35 inhibitors of the SARS coronavirus main protease 3CLpro Targeting the 39 SARS-CoV-2 Main Protease to Repurpose Drugs for Exploring different approaches to improve the success 43 of drug discovery and development projects: a review Identification of slow 47 molecular order parameters for Markov model construction Improvements in Markov State Model Construction Reveal 1 Many Non-Native Interactions in the Folding of NTL9 PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of 6 Markov Models VMD: visual molecular dynamics Four Binding Site Detection Algorithms Crystallographic and electrophilic fragment screening of the SARS-CoV-2 main protease CHARMM36 all-atom additive protein force field: Validation 25 based on comparison to NMR data PDB2PQR: 29 expanding and upgrading automated preparation of biomolecular structures for molecular 30 simulations Towards the comprehensive, rapid, 36 and accurate prediction of the favorable tautomeric states of drug-like molecules in 37 aqueous solution Epik: a software 40 program for pK a prediction and protonation state generation for drug-like molecules A Force Field 44 Providing Broad Coverage of Drug-like Small Molecules and Proteins Protein and ligand 48 preparation: parameters, protocols, and influence on virtual screening enrichments On the Role of the Crystal Environment in 1 Determining Protein Side-chain Conformations Development and Testing of the OPLS All-4 Atom Force Field on Conformational Energetics and Properties of Organic Liquids Extra 8 precision glide: docking and scoring incorporating a model of hydrophobic enclosure for 9 protein-ligand complexes Docking For Pose Prediction FRED and HYBRID docking performance on standardized datasets Conformer Generation 19 with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein 20 Databank and Cambridge Structural Database Extended-Connectivity Fingerprints Analysis and comparison of 2D fingerprints: 27 Insights into database screening performance using eight fingerprint methods Large-Scale Systematic Analysis of 2D 31 Fingerprint Methods and Parameters to Improve Virtual Screening Enrichments Open Drug Discovery Toolkit (ODDT): a new 35 open-source player in the drug discovery field Open challenges in structure-based virtual screening: 41 Receptor modeling, target flexibility consideration and active site water molecules 42 description Accounting for water molecules in drug design. Expert Opinion 45 on Drug Discovery Small molecule docking and scoring A two-point IC50 method for evaluating the biochemical potency of irreversible 1 enzyme inhibitors A Perspective on the Kinetics of Covalent and Irreversible Inhibition